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(57) Abstract: 

PROBLEM TO BE SOLVED: To analyze a biological sequence of nucleic 
acid, etc., with computer system by inputting and analyzing plural 
base calls to each base position along a part of nucleic acid 
sequence of a specimen and producing and displaying a single base 
cal I. 

SOLUTION: In a method for analyzing a nucleic acid sequence of a 
specimen in a computer system, plural base calls to each base 
position along at least a part are inputted to a computer 100 and a 
base call having highest occurrence frequency among these plural 
base calls is obtained by database 102 and a single base call is 
produced as a base call having highest frequency of occurrence and 
inputted to a chip design file 104 and a pattern is arranged on a 
mask 110 and the chip design file 104 and an image file 124 are 
analyzed and single base call each led from plural base calls to a 
specific base position in the nucleic acid sequence of the specimen 
is displayed as an output 128. 
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ii. Hic*ilni: isecondint ic:>sA^S^5-l!a#4i<t; Dj^* 
{ Hiqhlm - Secondlnt) 0 7^ I ^^5' i' 

[ 0 0 6 1 ] w.-<j>'7-^ Hi*, m-<D]ki>'>'m-<rAtim 

iiUT©J:^{<:?§*§n4. 

[0062] ^Secondlrw-TCrrnrcCnt^LwInO}/ {Hi* 

Int-r (ThirxJInt*Loft'!rn)} 

[0063] ->^rAit. <:©^-«'J;b3!>^p^€®fii*jll 



19 

[ 0 0 6 4 ] sH<?>r X h i*, mmt.immmn&j:^ 

o><i: ^ cc^lt? 

[0 0 6 5 ] HiqhInT. / (Hi*Ini;,^(Hi<#iInt,.,* Hi 

Qhlnt,.i)) 

[0066] CCX\ -l^k^uii. i^£>'ffi$n-Ci,'-5> 

-lit. m^tiis.&&mici)»i,\$i^^r. t-'^^-r-i^ 
[ 0 0 6 7] c C'J; ^ ic. ^m^m^.i^o-^r-.^-^fc $ 

[ 0 0 6 8 ] C OCv^tee^-Ci*. 3 -P®3^«^iS:S^ft*« 
[ 0 0 6 9 ] 0 1 3 1*. -o<^>^- •;> h i^jl.-:7-icBa<., 

t-Jged^D-'g u«iS5rii^> □ - * - r r- 

Ak* < W -T- >J max $ nA; ^ ^ y 

at- ^ X t tc J: ■C^.'iS S n)fc >' - 

:^<z;'9i)€)|tf«{<: fc c ^ r ;f 111 s nfe^s^ ) ^ a/j ? nfe 
^ hi^i^i ) Ji:iS:*OT:fe<«'r<>J:<,>, 

[ 0 0 7 0 ] Wr/U -J K?fJ.^S?sm*. ^a-:?- <$>^ 



iw^a^i 1 .:) - 2 7 2 0 0 0 
20 

X ^ 4 5 4 C, iT'.'L' - •/©S^ - h U C 
it. ^Cr>fe«tegCC fe W ^ :^ h Cf^ ii-b^l'®-' ^ 

[ 0 0 7 2] t.':^rMt. 7.^ -v fA 5 6 X\ ^© 

fe^g 5: , i^^l"- ^(^teS i O r sf Ji^a-r 

O. h ^ i:X«>J: ^ .laiaS^ -^|,jt>sfj: ^t^Aii-? 

4. 

[0073] TTj - Sax-;, h 

FNj - 1^-.;> r 

[0 07 4] {KDift'&lCi*. h©"5fe3ax:y 

T 'i? - -*ll^©l^fe««cc4at.>T: 

it. 015 ccip ? irii>ismfim i^f^txi^ 
[ 0 0 7 5] ;^ -7' 4 5 8 f', S'lft 4S?t-5''^ *:-X<?? 

[ 0 0 7 6 ] HI 4 it, S^Q^ji h i^jL— ?*i<:fjg I, 

E^j i-Sc-r -5 t.isst, n# t^.'i' - 
iimstcji^yisbxi'ii&^icit) . ^oi/}i'--:f!&m 
[007?];^?- y 75 0 2 X\ f-Mt. ?.'U 

m^miti,, igS2-.>^^snt. pij^it. ^i<:si3 



21 

I 0 0 7 8] ^Xlc, y<-r 9 0 4 ^C. ^^^^^V 
[ 0 0 7 9] S;2.:^ V m-^. 

[ 0 0 8 0 ] vX^^, Ar •;> 1 O^C. SSS<?>:2.x - h 

[ 0 0 8 1 ] III 5 (i. -o<:r>ji - h 
r i£go?i:>'ffi b ^> :^r^*^:7^r P ^ n:- 

[0082] rcj -3:2.^v V 
FTJ -iiix-;>r 

FN J -4^^ h 

[ 0 0 8 3] C O^'Jg'&CCi^. 4 ocr>gglEaS 2 ^ 
[ 0 0 8 4] ^<7>^x h y.'l-':^^<:r^ o 



22 

10 [ 0 0 a 5 ] ;^ 7^ y y 6 0 6 x\ m9immmm ? n& 

[008 6] III 6ii, igS::-,'L^^}S^*^it'^i^c«> 
CC^aO^::^-;^ \^ ^.^l-^^^mn^^^^^^^ry^- 
'^^^-VVih^, ±^0fc<^:'yieC, 
X, ^mm^\l'cmi>T^ii^i>^&i>:P^Jil^ i/JU-:f<7># 
C i tc J: 0 . ^}l-yo>mmH7t> 

[0 08 7] :^^;'^'7 0 £t-\ i^:^f-Mt. ^ig^x 

i^i>fcmc, m]^&?*io>^^:i^^^<t'b2-:>i>m^i^x 
mmini>mi^&&&&t'ci^i^xm-x'hi>m^im, 

[ 0 0 8 8] <ti>Z ^<0#B^|B^J:^^pS(r ->^CC^* 

[ 0 0 8 9] T^^^w.^i,^. m^^ni^mtf:^xi.^^^- 

X'^6 {-^xC'P^jSc^^Uci^si^xm-xhhi^mi^i^^ 

50 ^0 



21 

1 0 0 7 8] -.XiC. Xr 9 5*5 0 4 'C. v h y.'U 

[ 0 0 7 9] Saj^ r i^>»^-r7- 
i KM i^.'b- ) t/C — ^©aS ^3 - «r ^> <ti!t>fe 

[ 0 0 6 0 ] v^ic, :''5 1 0 'C. &'&<D:3,^ v V 20 

- ^(ctE o -c , A;*? 5 asfustc few* im<r> 

5 0 6 ocs 0 , ■-X.oiSM^&icm b r a»:J -.'WA 
[0082] roj -3^:^f t 

fTj -li^^^r 

fNj - 4 -3.:^ h 40 
[ 0 0 8 3 ] C ©iS^CC{* . 4 •oi^ffilEfe* 2 - jUD ^ 

[ 0 0 8 4 ] mm\^t. h i/,'l-:/«<:i5gL' 



■ftea^ 10-272 0 0 0 

2? 

[ 0 \)6h\7,^v ^-e 0 6 r-. m-mwAm^nh 
n-s. 

[ 0 0 8 6 ] 111 6 it. tlS^ -.'U^?i^*?i-r-Sfc» 

-^li*. fiiTE<r>(gii:=s:^(./c4>J;< , *h^t,4*, ■cf^-v 
mmcm'7X^uxi><i:i.^o fi^it. ^xoi^fi^-y 

miicMbXiti d^mmx-^^^^. C©SB©^ 

«(5B'5:^tf^; ^ tc V K b tr <> J: (, ». 

[0087];^?- y 7 0 2 -C, l^:^ ?■ A ii. gii^n 

C ^ * JISiiLS©*«:«'tfTT ^» J: ^ oc I. c J; t ^ .^-CC^ 
[ 0 0 8 8] -prj. < i *> 2 '0(7>#,Bgg2^Jj!>5M!ie o-Ct,^^> 

•&igs{4g(c:fc(,»t:, i^.'L— ^{cijaurfifyiii^n 

iggtAWjro Ci-^^iS&Kit, :r4i&?i.^.(c i d-5i)BA. 

[ 0 0 8 9] i^^i^if ^ffjiSfiJiSw ^-Co€.« 
)ieS39lJi5^iaStLie©8^ * jn-S- i * C i 3!>i 



(14) 



25 



[0 1 0 1 ] HI lit. < 

■y'^v-i iim.m'^y T H' >\^iph?<rj ^ n/ctiiE ^ a $ 

CC4<$(.ir, '^'?':fyy ^>V(Dyy '{>^S<?>|5tC^t^ fE>n 

r c i -5 ;^ i? >J - > r 3 o::- + ^ 

1 Z<rMX'\t. HI OCiiS'^i^rgeB^bfc \\!tl'CW->< 20 

it. •;':^i<:B9t'4*gS:2'-.'U^?s>^:8^L/C. iPfU 

n*^#:E?<J4 4 0-2 0 1 s^cittej^^tir. 

1 8 r#.fi£?iJ4 4 0 - 2 AcL L'r5:$t^^p=^ 

tglfeoc-irt^Si^^iSa^^totrfe, *)*c^it. *i«!^f<:fc 

<D^c^^o lai Soc,-Te^c^:^tc:, ^-if'-it. ^^h-t-n 

1 Z(rMx\t. «nc-ncc■iISocf^T^'>^'Y^l» Kit? 

[0 104] m\l^^n<ti\'^^C. ^ftSg?ij4 4':)-2 
A C'SI*<7>|!rf<: X i? 0 - > r -Y ^ > + -7 



10-272 0 0 C' 

26 

:a - ,'Ht . If :P^--}li>^6mm^nXi.^i>o 
it. |2?y^litfv1*4K^.tc. ^^tcrSU^C, r-^?mT^ 

[ 0 1 <:« 5 ] m^'^mm^m 

mi 9tt. i%^^<D%^^^yO--y'tT^1i!i'^^u-^'7' 

^*t^y a-rj it, ^;TiE<JC'tgfi^?se5iJ^<:^^tctitifl'^r^ 

MS. ms^^jc— ic^^icmn^xh 

it. S*<?>^^|3?'jtcrt±{^tBtlm'^£C^ J: ^^<:S5(,^fl^? 

-et^. m opm<DW^msiWcnux^^i^cm$^^<ii^i^'' 

[0 106] S!g^ffl^^oit* ;'::^^fflcc:M^ U<it 

^my^-^<^m0.tuxit. mm cm2 Sfc 

[ i) 10 7] ^ TSf^JtC. ( A^C^ltiig^) 

Itn.^l^^'yu-^Xh^, yccc. gi^JS^Otd^tUr* 

6o <:n?E>(?:>:/n-'7'it^*i^^'P-:?X'*)0. ^^-^ 
:7-p-'ytcit. ^n-t-n. ^^:^^:/a-7^tc*itr:i> 

X. ^S+#vP-VfoJ:^/^€ntcMJtT^^^*i^'7'P 
-4i<7>'7'P-:?':^>sfij.f5$^t^^, i^UfeJ: 
^ . ^FM'&'/ci - y^7>4=.l^^^^jSC(l > jJ? u * * K 'g- 

[ 0 10 8] %m'^yn-^<D^^kt. SffijE 
[0 10 9] Jia:.Of<:J:^{<:. #.fl^?SS^Jlt. ^mti. 
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- i'X r .i.<C Aj^Jf -2>. C C TT'^*, ^ ^tl'S* 2 > f :* - 

r Aim 4>cci>Pv ■>"c«>©t:-fc <^;t,^ i>Th(7>2 

M©*^:W-&:5'a- ^^-^y D - 7'a>.'N ^ 'J K 10 
■701 -yi {jfig^tcm-r*} ^.«^«Ki<^>p^©.■^-^ 

[0 111] ^7^i'7"904t:\ 3>b-;*-ifi';i^'f.i 

v:^aoi^ffffl{<:|^orii:. 112 0 ^^mi'Ximiri,^ 
[0112] ^±M•&7a-•7■iT*^^■7•D-•7-©.>^'^ 

2 - i --r-tcs^r -S J: ^ tc I. -c J: t ^ 
[0 113] Il£Oii> ?«7e^i?<jWCiSe+;0i#fe3S 

f-«>.&o Xr •;.-/9 5 2-C. :2>b-^.-^i-'X?-ACC, 
N*J©^^M^'/0 -T'i^M'&yo -yiCBat*^© 

[0 1 14] xf-yy9<b4-C\ -*^©■7■a-y©.'^•i 
.;, >7 y ^ ^) :y Kit ^eagg?:. ■C-cr>*J©#.'N Y y 'J K 

,><•;, iri?- 7-? > K^s(./?l< J:^tcu-C<>J;t,\ 
[ 0 1 1 5 ] ;x ^ 79 & 8 r. y'u-yMO''^-^ y U 50 



10-272 0 0 0 
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■? FffM?S(SC^«^iatt < D> feJ;Ci'l;l:<Dia<l <R) i 
ttlzf^., B|3*i. yp - 7' ij ? KSM^df<J> 
S ( I PB- i mni) *iS»IS(iSW,±r'**5!>i4-^*-.. fi 

ifi^is. Mfi^^il?:2 0^c^t©Bs^i■S:l. 2tcas 

[0116] I pro- i nin> = D . RCf. ! „«/ I „. > = 
R ©iS^tC ti . ;^ i' y 9 6 0 t? N P 0 S «r 1 $-ti 
NPOSk*. iile^A!-Sia-r^>5JgfeteA^®(.»Ci'S: 

Tfli-C^^o liil*>. iie^©lfeJlt*?J€"^«(i:. NPO 

[ 0 1 1 7 ] 79 6 2r-ii, I irni- i p«n> = 
!!>' I Win/ I !»> = R Ai,iai:T-&*^i '5 Sn 

■S, C<D55*^s£')£.-:>Ji^<<:it. •>'7"9 6 4-CN N 
EG^lJ§*B5t^^o NNEGtt« igg^*4|fe"^t«t,' 

[0 118] jie^*i#£^utrt^5.ci^^T. *^t,^ 

ti. ^Ji (./ fi 1 - C i .> ^ y 'J -y KSJsSait 

{ L R > ifSfifSdi < i D i F > ^SrH-^-r-So L 
Rit. 7 D - yitfl!?.' V 7' l» Kl^fiKSfStO^ft < I pm/ 

i »> O*HJ5:i6Ci(i<^;035^£66t^-l>o iD 

i F(*, 7"D-7'M<D-'\'l'7''j rff^S£?geC©M < i pm 
-liwn) ^iacitCJ:0*ii?)6n*. X-?i^y96 8 
•ca©7 C - 7'*}fl>.' ^ ^ 7' L» -y V^^^ifi^itn i 

fte^nnji. «a«r;^,;?-';>7-9 5 4{<:M<./r. ts** 

[0 119] ;^?-;'797 STT'. ^lEt^<J*^(,^-C, « 

\t. N. NPOS. NNEG, LR <m(3>LR)(^><i 

[0 120] PI = NPOS/NNEG 
P2 = NPOS/N 

P3 = { 1 0*SUM <LR> > / (NPOS + N 
NEG) 

[0121] cfih<r>-?m^^m>x. jfifi^-jii^^or 

[0122] mm<^tc^^. Pfi4tK-7AM^>SBtc:^ 

iir^s, Pi;?f^2. iw±-c*4iis:. AA^^r-^-So p 

■c-tiia?f©Ji^(cit, c*i^r-i>^. Biife. p 1 i*. 
A. B. c©3o<J!>i6a(i:j>si$n.5. cnji. 



[()123]A=(P1> = 2. 1) 
B= <2. !>P1 >= 1. 8) 
C= <P ! < 1 . 8 ) 

[ 0 1 2 4 ] X = < P 2 > = (i - 3 & ) 
Y= <0. 3&>P2> = 0- 20) 
Z= <P2<i). 20} 

[ 0 1 2 5 3 Q = ( P 3 > = 1 . 5 ) 
R= (1. 5>P3>=1. 1) 
S= {P3<1. 1 ) 

[0126] liaa^y-^KitcSt-^-C Pil4fiirtfe'©iKa 

tc^>|ii u tern. iSe^is^ciii^^rtT ^ , 



(16) mSW- 1 0-272 0 0 0 
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* [0 1 27] CCf, jg(£^<?>#feJil^Sgi*. i?a (#6 

-rn^'f^Sn^So Ta arm <X or Y) and (Q or R) 

S-t,'d-';t^i, Pl>=2. K P2> = 0. 
2 0. P 3 > = 1 . 1 A^g^OSoig^ect*. ag^A^lfe 
JaL.-Ct,'4i^^?>n^. 3j; r B and X and Q J 

* [0129] 



[013 0 ] 



[013!] 



A and (X or Y) and (Q or R) 
B and X and Q 

A and X and S 
B and X and R 
B and Y and (Q or R) 



[ 0 1 3 2 ] -7" 9 7 4 r, ^ - -tf- ic-^m-^'-m.iTi 

A ^> tc a;^T * J: -Mil -C *> J: t, ^ 

[0133] ±-c<?>7p-7'M4aAei/r, 

3|R?l4rSiS U ^c?%, X ?• 7 9 7 5 -C. L R 1 0 1^ 
L/?di<^>^li3(S?:H-#-?-*, $6tC, NP0Sj5;Cj'NN 
E G -kmiti&tcV d - •/iCBBf I D I F «l©¥«5* 

ftkoS.^cC'iEgfi'ijnf tfcu^^i ^> cc r t><^; t, 30 

[ 0 1 3 4 ] ;^f-:;' 79 7 6T:'«SWiai*!S«:fT^>. - 

{>iC<Dm-&. n^\t, X?-i'79 7 0"Ci+#5*^fcii 
«:ffii.»4> . S']©«<it/-C. *^1*f*4>Ci:?¥?S-r* {->< 

Pi^Zihi}, 40 
[0135] |aB^4:1St#ec-r&&iit>t<:. 1^2 O'Ci*— 7 

g n 4fglifl>iii£^©S4fi<<: c (Dism ^^i&mt * c i ^> 

[ 0 1 3 6 ] 0 £ 1 1*. mz.^mn.m^v y^'^-xro 
■7 < 0 0 2 6 -r^ if *siBji!i i o o 4 50 



^S^tcBBf •SSiar-3f ^|2fl5f«?-SJ: ^tc. S^r 
[0137] WTC^>X i7 1 » - > r> ;^ :/ 1- 

e>if fe f - S tl ■5. 4ffl t to 

0^ ^ k*. W I* •? Kiis ^ 5 n A: f — -/ ^' - 

[ 0 1 3 8 ] 112 2 ii. j2lltSnfciif£^<:^l?ffijg.m«: 
y'V ! 0 3 0 51 V f J? X^JSpi|!5CCti. 

[0 1391 r-9^m^.mcit. w^ofimon^u 

Rlf ri*»qam'ej 02 0 riaB^l-fcN POSS 

t>'NN£GOfii?:n^f ♦ r Pairs j «. -eojig-t-©*? 



31 



[0 14 0 ] rA>/q Raoo J ^it, &-&jie^©^rO 
7a--yccrflT-5 i pm/ 1 Mi©¥*l?:'T^f , r,^<, a 
% J tUjv*. ! 0 tr < I pm/ i TOi) (D^JSj?:^'^o fPM 

ffcW Excess J mt. ^-V-'''i>m^btc^^&,t:^J7:i 

(fiji TMsqanveJ «B5<'>tb?:''S'? ( rNeciai:i'/<;J IS^^-trD 
$:^t»ig^tC»S:. flnfj i*^S^ni) . TAvq [)if 
fj ma. ^<Dm&t(C9^iri>^iSi^!l[.^i^.to 

^<Bnfe, {IDiF)¥i^). 
[0141] TAbs CallJ Wii. ■t-C'l^^tCPfl-r^Me 

*5SI?i|<3?igG^-^Ja2->K<:Bg-r^>ifJflti. 112 0©A 20 

'A :/9 7 4 ?:#ESL,rgSii:Jsl^<..?c. 
[0142] a-tf-Ai^P^riglJil-r^i, / 

tf-Ji, #5f>."f-l 03 4 

[ 0 1 4 3 ] 02 4 isi!?$nfei8(S^(^A?fft5SS«: 
*-rS'l(DXi? L'->rVx:/u-{^^?^-ro x>'i»->-7' 30 
1 OeOfl^i^?:? ^ 062 

[ 0 1 4 4 ] 02 5 tt, iSlR^nfclSlSifWaii^tCfflf 

f . ;^ J"J x^n- 1 16 0 V 
^ 7^ a^SI^ 1 0 6 2 i - m.^^!^ 116 4 



10-2720 0 0 
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[ 0 1 4 5 ] 02 6 ii. iMjR^nfelSSiOiSe+tCRa-?' 

S^T. Xi7 ';->r^/;^:/^'^ l 2 0 OCiT'^-:? < ? ? 

1 2 0 2 (Cli. 'r- ^ W^s^j^ 1 2 0 4 r-jg 

m<:A^=^<D^m l-^ .'I'i* . # y ^ 7 ®fi5T-.T^5 tin ' 
r^^n^ (02 0© ( I D ! F ) ^i^^^#.> . 

[ 0 1 4 6 ] 0 2 7 ii, ■m.^ntcWm^iE^'t^am 

•^*-.€' i?' ? 7 ® jtJr'*^T -5 , $ jE>iC !5']C> A J? ij - > r 
^ ;^ ■/U ^^Sf „ X J7 L' - >r ;^ ■/U 1 £ 3 0 CD 
i^-? 7 ^ i7 7.^ih%m 12 3 2 cci*. r 
1 2 3 4 r-^iR3nfc3ig^^«:jetfrt-?<J».m©i^'7-7AS 

CCljg-r^lfeJiU-N.'l'i^^'? 1 23 6. ^iSi^gJSCMi^?^ 
1 2 3 8. .5^5. ■? Y^^!m0.t^y-y 1240 

[ 0 1 4 7 ] 0 2 8 SO'0 2 9 ii> Sif ^tW: 
(T^SS-r- * ?:!!il<?>^#?{^(*itbt$-r * C i tc<f; ") > <- 

[0148] 7f-yvi 3 021:% 3>b*:*-i?i-';sr 

vmb. ID u < g!Sj:»6{i6nfe^^*}^v"D 

y l» 9 K^g.^SSif* ! iTO-C^f , V 1 3 0 4 1?, 
[0 14 9]-^?. 7.r9-:fl3<:>dX\ sVf:.-* 

^ j mVmt, 7-7 y y 1 3 0 8 V. J^ yt?^y'^>V 

[ 0 1 5 0 ] -x^. 1 3 1 0 -c, i - j n<j»^ 



(18) 



33 



34 



1 0 - 2 7 £ 0 0 0 



-{ -J ViiiHms^^±.mtti>, pilots:. \ -JMoy 
f^ss^Scr-il-s ^ <./c 4> J: I »o 

[ 0 1 5 1 ] 9^ ^ 1 3 1 2 T-. •7'a-y<3> I - J M 
FftJ.^X?gittS<^^4l < DD i F > i$J-0 

it<c-m <RD I F) ttmti. mmicit. hi>n 

{ j Pin- J «:•-'^-^VL» 9 KSJ,^^f«i9l|o>*j < ! psi 
- I m) 0>.'>-<-7->) y Km^]tC>M^!S^i§fgW±f 
^i/l^t 'jiP. *?c> { J p«n- jrnm) CD^' n -f -T" U 

? K?fJ,'S?SSiS'to^ { I un- ! iroi) ©.M-ryj K«5 

[0152] < j pm- J itn>> - { I ptn- i iiiiii) > = D D 
i F , Sr>, < j pin- J Will) / { i ptn- i iiiiti) > = R D 
i F^iiSlIf -Sl^^Kii. :^-T ^ 1 3 1 4 -CN I NC 

«r 1 Jg^QSii-S, -flStC. N j NCi*. S^^#ft©7C 

N i NC^^l'T. gdl1#(*ii;bliSbr*ml## 

[ 0 1 5 3 } 7 1 3 1 6 T', < j pn- J m> - 
< I pro- i iim) > = D D I F , fl'O. ( J pm- j roi) / 
( I cro- i tiBi) > = R D I F*i.i!Sirt.S*^<!:-^*^©*l* 

Ai^c3n6„ COC'S^Ji^.'SOil'PJg^iCk*, NDEC^:! 

[0154] nm^ii-o>m5.^mn:/)i±^i'i)\ ^^i^ 

©::?'a-ytCKbr. NPOS> NNEG. SCi'LRfl 

■^^ T . ■e©g;03a,3|^9«(±cr;. t>{^*^*Pfei*fl?fe 

m-r, :/ 1 3 2 2-C'' W7- 'J FflJ.'S?Sfg?:W?' 

[0155] vX<<:> 4^^1*112 9 iC^->X. X?- ! 
*9?B 



ig 2 0 <?> A r 7 2 i 9 7 4 ^l^tf T 4 C i J: 

[0156] :,^cC'Xr 1 3 2 6 'Ci*, ^^tr^lJ'Srffl 
l^-C. 2-7©^(4ia©iiie^ma>«?:?fe*4. C(Z>^ 

B. NPOSE. NNEGB, NNEGE, N I NC, 
NDEC, LRB, LRE*5^l^^n*. ^Stf5lJ<^>it 
N ! NC*5NDECt!±f*)-5*-.i-^A^ti:J;-7'C 

[ 0 1 5 7 ] N I N C> = N D E C^tg^lCit. .XOA 

[0 156] P1 = N INC/NDEC 

P£ = N i NC/N 

P3= { (NPOSE-NPOSB) - (NNEGE- 
20 NNEGB) > /N 

P4 = 1 0 *SUiVi {LR E-LRB) /N 
[0159] cnii<DPmmi>'X. 2':XDm\m<^ 

[0160] ±S&tmm^. sm<^tclf>iC, Pfg?:fi)T* 

[0 1 6 ! ] A= {? 1> = 2. 8> 

B= (2- 8>P! >=2. 0) 
C = < P 1< 2 . (j ) 
30 [ 0 1 6 2 ] X = ( P 2 > = 0 . 3 4 ) 
Y= <0. 34>P2>=0. 24> 

2= <P2<0. 24) 



[0 1 63] M= {P3> = 0. 20) 
N= <0. 2 0>P3>=0. 12) 
0= <P3<0. 12) 
[ 0 1 6 4 ] Q = ( P 4 > = 0 . 9 ) 
R= (0. 9>P4>=0. 5) 
S= <P4<0. 5> 

[0165] ±U<^-7'''>i^mcU'7'Z P4i«:&TTE'©*&a 
[0 166] C©Sg^. N I NC> = NDEC-C*4?i 

[0167] 

A and <X or Y^' and <Q or R) and or N or 0) 
A and (X or Y^' and <Q or R or S) and i>1 or N) 
e and <X or Y^' and <Q or R) and i>1 or N) 
A and X and (Q or R or 5) and (M or N or 0) 
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A or Y or S or 0 

t and (X or X> sf«i <Q or R) and 0 
e and (X or Y) and s and (M or N) 
C and (X or Y^' and <Q or R) and ■>! or N) 
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[0169] 

^Jtv L/ 

[017 0 ] a-y-'^C'tH;^3ii, i , i^ismm 

[0 1 7 1 ] N I NC<NDEC<^>t8^CCii. ;ic®4-:> 

[0 172] P 1 = NDEC/N i NC 
P2=NDEC/N 

P3= ( {NNEGE-NNEGB) - (NPOSE- 

NPOSB) ) /N 

P4 = 1 0*SUM (LRE-LRB) /N 

[0173] cn<r><j>pu^mi'X. zr>o>mtmo>^ 
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* [0 1 7 4 1 CtlhoyPmt. iiaC'N i NC> = ND 
EC«'ii'&i|sit;t'< -^A-oC'ijSatc^i-liR^, P(iii± 
10 iliii=iivi&aic^>§i$n*©T', cc-citfisii»sfb©feit) 

[0 1 7 5 ] CC'J^'S, N I NC<NDEC-C-*^fe 

* [0176] 
A and <X or Y^' and <Q or R) and ■>] or N or 0) 
A and <X or Y) and <Q or R or 5) and •>! or N) 
e and <X or Y;. and <Q or R) and (fcl or N> 
A and X and (Q or R or 5) and (fi or N or 0) 

A or Y or 5 or 0 

t and <X or Y^- and (Q or R) and 0 
e and (X or y;;- and 5 and (M or N) 
C and (X or Y^:- and <Q or R) and OA or N) 
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[0179] :2.-^^^<c-mMt. m-p'&tK m-'m^ 

[0 180] CC'J:^CCL.-C. &m.Wtt^m¥Att<D 
(PIJ^J*, ! 3 2 41?) M:^C'-1fC<*ic4a<,irji 

[0181] < I D i F B ) <J>¥lil> = 2 0 0 

( I D j FE> ©¥1S3>=20 0 

1. 4= { (IDiFE) OC-^l*^} / ( { I D I F B ) 

<^^fc)} >=0. 7 

[0182] Bn*S, R^C't^ftX-iSg+Ji^mt CJg^ 

M^r©^^*<^>¥^^3ae[a^■■42i^i•|=^ i;i:.fcni?. j^ad 

i F BSO' i D I F Ett. S^t*:tC551*4 ! D I FOM 
[ 0 1 8 3 ] vXJC, ^ 1 3 2 8 C. SSM^£s<t> 
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pn- jwm) - < I pw- i m) ) ©^i^^ilt^f 4. * 
fc. J pm- J niJiC'^fcji I psn- i „.C'^lSjW>t?:lT-8tf 

^71 33 or. cn^©fji?:ffit,^r. (i«>;ft 

[ 0 1 8 4 ] US 0 ii, 2-o©:*^<C45W^ji6g^|fe3i 
©^,f b^:*-?- ;^ 'J - > f ;^ V" 1. ^' 5:^-? , 'J - 

< 1 4 0 o<<:i*> ifv-p i-y 

1 4 0 2 i r - 1 4 0 4*i^*n^>, 

•C-oc:•^^^i^mc•J:blSAi^f^:>^^4. cccii, 

l?)]^!*. ^-y-'-tt. rg 1 82506 J 

2 •^<D'm.^m<Dim i f±*i*^t:-* 5 J'ca!), a 

i*. 113 0©?'--5f*S^^teecc Ffco J iSS^fltri' 

2-:>«;';*^tcte«-&SG+-^<![>s:<k?:*'r9ijC'^ t? y 

[0 1 8 5 ] S3 2 tr-^B^l/A:*a{<:«it-7 
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tlfc f" ^ Ai^ffJs? t'Tn^ inXK'i>, r Lxpenmsnt Nam 

Tcsne Nan*; J it< jgfr^C^SIj^^nf , f IncJ 
tc. Ni NC4^.^;f>'NDeCC'tt-5r^^1'„ 

[0186] rinc Ra^io J (fiSti, ''^-f KJfJ.iS?^ 
:^T©feSiig<?:'lJ'C'ii-7ft:il«:jS-r. rpsc Ratio J 

mt. -'u-y i> -? vif>im0jy'>^'p ^ ?cSSfiLgcc'is«r 

fi&'T^-S'o rpos Oianqej ^a3^^> SS^^'-^figiif- 
iJ 4<5l ' fiEC'ff/?.*: Wr * -7" a - ■?'4!}(t>j|S©M 
-r, TMsq ChanqeJ iSSi*. ^.fe'T'- 3? ittSIr- ^?CCfo 
t, St^if »^ VP - (^^^^7"a - y i 

[0187] rinc/D9c J Wi. ■^^■^'^<>C^i<^Xf^ 
^ ^' >) V V^i^miim^iii t V D - 7'M©»i -■ ^ V 

TAvq DiffJ -laSi*, %1gif^-^v:^l.aW-im!^M^7h 

[0188] iWh^nXK^tii^) fDiff Call J as 

V 1 3 2 6 ^^mi'X'mUcX ^<<:if»$4a^.. 

i>, ^v-? -< i?;sS*pi3( 1 4 0 2 Kit. 

[ 0 1 9 0 ] 0 3 2 ii. 2 ■:>o:>S^{i:46W^>iSe+SJi 
<^>^(b* 3 <)C5cS i?' 5' V (OJtX^ L/^c ^ J' U - > ^'V X 

7 < y j7A*JR^j!Sl 44 2ti:tt, f-"-**57n(iiJ«l 4 

\x. (epfe. < I D I F > ®^iij> i ttr^ 

3yi:5c*y7-?{*:J:y, ^2.— tf-t*. ISM 
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•5- -5 ^|=i B*tc-^ L/Cdis S -ti * C i cc J; 0 . IfeJi 
U^^KJPS^.^^^-? .S. J: -5 4 C i r * ^. 
[0191] mk 

^. R N A cr><i; ^> jgr«!l<Z>!ta!gj!>^-&.'!SS nfc 
[BSOfifM^ilH^] 

[HI] tifm€-yy I- xr^;*tfT^.?ciW<:fflt,^6 
41-5 5 > b* 3t - V A r ACCiN^mfH, 
[ H 2 ] me^'li 3 > e ^. - * t^;?, f - A*^'? i.' -7 i» 

[114 ] -^im^Jii^^rAim-t^vy V 0:>^7<D^ 
[05] ^iW-'j: i^x '? A-c®.ia s n/c ? y (0^m^ 

[0 8 ] ill 7 ©<^; ^ jyr#Hs.ga?ijiiii<: . * -7"±€#fi'j 

[119] umim^^ut^^mm^^^tm. 

[mo] ? :fi)-^<!^mhtltc.>Uy>) -v 
^■r X U - > A -7" U ^ 0. 

[012] rfljif4va-:3-cr>.>x-/'7-y - rSM^i^ji^ 
6lge2-,'l,5:ii#-?'^»!gi|<Z:.*^jS:,=S1-7 O-^ + - 
K 

[013] — h i^.'U-7"(<:4<5t,^T:a53-.'L' 

[014] h i^Jl'-7{C4(Jl^r^&S:3-.'U 

5A}g ?:^T ^ *a 7 D - * - h . 

[015] — h i^.'U-T'^c^o't:— 7©>aa 
*B?i:j<a-r 7 D - * - h . 
[016] ag^-.'^^>tg^;*^Tt- -S"c4t)cc:ssc^:i 

[017] — o*4t,^lilif i:©* -/^.fS^^hyi:;^ 
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[018] -'O^i^l'im^O'f y •^•^■■fiiifet^ycS^ * 

y >) vnii^^f^Mti^-?' ^> c i cc ct; D , ^) ^iig^<^) 

[1320 ] ^«tt?'j«:)^i,^-CjSe^*^3'i(.'T:Ci-S*ii 

^ I. :^^*^-r 7 p - + - h o 
[H £ 1 ] iSlr^lfeJi^sstS-^ 6 y 5 h •? » rc^;^ u 

[H 2 4 ] '-mR$in.fcmi.'i-<J>mW^'Sk^^t%1i<0:^ i> 
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[027] jgiR2n?ds^©ae^ci5g-*-4^8Sm© 

[029] «!g**r - iS® jeS ^- i ^ tmt 

[03 1 ] 2-7C*^tCfc(t^>jiG^S31(3>s:ft?:S-r 
[032] 2 -:>©K^tcist* 4ii5fe^^ia(^s:<t^ 3 
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1 Title of fm'ention 

CCMPUTER-AIDID JHCHNIQUGS FOR ANALYZING BIOLOGICAL 
SEQUENCES 

2 Qoiais 

L In a computer i^y^tx^ni, a method of aiialy2ing a SAiupJe nucUHc 
acid si£^u<£iii:e. Him melliotj comprising th€ sdopa of: 

inpattiEig a pHiUcility of base calls for each bese position ttloiig at leas^ a 
per lion of ti^ sample nud«9.c acid sequence; 

for each base poeitiotv aiwlyzing the plural% of base calls io g^iierate 
a single ba&e r.alU and 

displaying single buse trails foi l^asc pasilions along Ihe at lo^^t <i 
portim of iiaid sairiplo nucJteic acid sequence/ each olf ihe single base c^Us toeing 
dcrned from the plurality of base ceills for a specific base position in tbje scimpie 

2. The medrod of da tin I, wherein ibe dnaljzaTk^ siap com prises 

the Steps of: 

for each base p^sitioi^, ^etejciiutv'ng a base call of the pliirdlity of base 
calls which occurs n\o^i often; and 

generating the base call as tK<? bifise call ii^ai cecum !no3i oflt:!! 4it 
the base position. 

3. The method of claim 1, further comprlshig the st^p of 
displaying a screen iron which ^vhexi activated by a tiftercauMs Ute plurality of ba^e 
calls al eadi base podidon to be display ed« 

4- The nbethod of claim h furlher comprising the shep of 
displiiykiig A screen icon which when acHvafcd b>' a user causes the pltirolity of base 
calls each base position not to he cilsplayed, 

5. The method of tiaini L further comprising ^ictp of 
displaying tho plutalitx* of base calls ai each base aLigA<>d wiih ihst single bM*i cails 
according £o base pcsition. 

6, The method of claim 5^ further comprising die »lcp uf 
dispLiyijng with each ftiase c<^lll of the plurality of base caJte hybrldliadion iatensitie)^ 
indiraHng h)rbridi2ation affinity of a probd acid the samplo jnucK^ic acid sequence, 
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wherein each base call is determine by an analysis of Ihc hybfidization intensities- 

7. Ill a com puller sY»\em, a method of callixig an unknown base in 
a sample tiuckic acid sequezitce, the method compnsmg the steps of: 

receiving hybridization intensities for a plurality of sets of nudeic acid 

||>jrobe5j, each hybridization inb^nsity indicating a hybndization affinity beivveen a 

nucleic acid probe emd the sample nucleic acid sequence^ 

computing a base call for the unknown bas« for each set of probes; and 
computing a singlf^ base call for the plurality of sets of probes 

according to Lhe base caJl for tlie unknown base which cccurs mo$t often for the 

p Jura lily of set* of probes, 

8. The method o* claim 7, wherein, each set of probes v^'as 
generated accotxJing to a$djiie reference sequence* 

9. The juelhod of claim 7, further comprjfciiig the step of checking 
exception rules that specify the single base calt for the plurality of sets of nucleic 
acid probes under certain conditions. 

Ill a computer system, a method of dynamically changing 
parameters for a compuLer-implexnented basie cailii\g prot:edure> the method 
comprising the steps of: 

generating base calls for at ]ea«i a portion of a sample nucleic acid 
isequence iiHJmng the bosecallaig procedure, the bast calling procedure including a 
parameter that is changeable by a user; 

displacing the base calls for the at least a portion of a sample nucleic 
acid sequence; 

displaying the parameter of the base calling procedure; 

receiving input from the user specifying a new value for the paraineter 
cftha ba$e calling procedure; 

generating updated base calls for the iki lea$t a portion of a sample 
nucleic acid sequence utilizing the base calJl](xg procedure and the new value for the 
parameter; and 

displaying the updated base calls for the at least a portion of a sample 
nucleic add sequence. 

11, The method of claim 10, further comprising the step of 
displaying a pluralit)' of user-changeable parameters for the ba^ calling procedure. 
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12. TKft m^Kod of clAim 10^ wherein the param^ier is selected from 
the group con&isKng of A conslaut threshold, aind range 

13. h% a computer system, 9i tne^d of monlionng expr«$s<on of « 
in a samplo ttuddc acid snqucinre, the fnethod coniprt^iig tlie steps oi: 

inpuliing a pluttkiicy of hybndizutian inteAsities of pairs of perfect 
ocmtch find mismatch piT^fceS/ the peifed tiiakcU probes^ being p^^fecdy 
compJemeatciry to tlie gane and the mism^lrh probes having at i«a$£ om 
mismatch with ilt^ gen«, amd the Jn^bridization intensities indicating hybridization 
infiiuti' hetw€^ the p<erf«ct nialch and 0ii«tnatcli probes llif* sample nuckk 
acid sequcnc<>; 

compating the hy-bridizatio*^ iiil»it^itjes of each pair of perfect m^iCch 
prob^ m order to gei^erate a gene expression calJ of Uie i^ampk nucleic Acfd 
seqti4»ac«; and 

didpl^yiiLg fhegeae e^pnssc^ton cdll. 

14« liie methoJ of daini 15^ farther ^-onipiising the step ol 
compArii^g a dli^enenc*^ betvw««fi hyhridizaaon iiitensjEie& erf perfect match and 
mismatch prob«fl eit ^ pOffition to <a diflfcrence threshold. 

15, The me^^d 9f claim 15? further comprisinj; the step of 
COTipariitg 8 quoticrtt of hybridization iittensiiies of p^f^t match and mismaidi 
probf^ at 9 ba»<^ position ta a riitio threshoid, 

16. Thii meeliod of claim 13^ iurthDr compHsing thet $t<e.p of utilizing 
a decision oicitrix to dchernnne Hie g<?np expression cat J. 

17, The iiiet:l:H?d oi daiin 13, tv herein the expression call i'S 
sdected from Uk group consisting of eKpressed, xtiArpnal and pb^ent 

18. In a coinputs&r system, a method of tDOUi boring exp««5sion of a 
get^ in A san^pienuchijc ^cid s«|u«nce, the niethod ccjniprisu^ ite steps of: 

lupuitltij^ a plurulily of hybndlzatioa intensilies of pairs of p<?rfecl' 
malch ii»d mismatch probes^, the perfect mnUh probes b«ng pesi^ciiy 
cx3inipl9m^Atary to the gone ^r\d the mifstii^tich prob«s having at \^.^^t one ^as<« 
inlsmateh with thi» gene, and the hybndi2ation inisnsitie^ IndicaHng liybtidizalion 
m(lni1y between thd perfect match and mismatch probi^s Uie sample nucleic 

comparing lite hybridization intensities of each paii of ]?erf«ct match 
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probes; an J 

generating a gene expression coll cf the sample nacldc acid sequence. 

19- The method of cUim IS, further coin pricing the step cf 
comparing 2t difference between hybridization intensities of perfect match and 
ini:%inatch probes at a base position to a difference threshold* 

20* The method of chtm 18, further comprising tl>e step of 
comparing a quotient of h> biidization inteasities of perh^tt match and jnismiitLh 
probes at a ba*<> position to 2^ ratio tl^resboid, 

21. The m^od of claim 13, ftirllber comprising the step of util izi nj; 
a d^clsinon matrix to detGtmine gene expression call. 

22. The method of dalm 18^ wherein the g^iie expnsssion call is 
selected from the group consisting of expressed^ jnarginal, and absent. 

23. la a compuler system, ti m^rhod of monitoring change in 
^ypr^sts'^on of ^ gene in a sample nticfefc ^wid sequence, tl\e tn^hod i^onoprising the 
«lep9 of: 

inputting a pluriility of hybridization intensities of puir^ of perfect 
match and mismatch probes^ the perfect match probes being perfectly 
cottipJement^rj'' to the gen& and the mi&match probes having at least one base 
mismatch with the gene, and tJie hybridization intensities indicating hybridization 
infinity between the perfect match and mismatch probes and the sAinple nticlc^ic 
acid $equ<!nce; 

comp^diig Hie hybridizAh'on interisities of eoch pair of perfect match 
probes in oider to g&nerate a g^e expression level of the sample nucleic acid 
sequetv::e; 

dc^^t^rmlning a change in expresdion by comparing the gene expression 
level to a baseliJt>e gene expression level; and 

displaybrg the ciiauge in expression of lh« in the sample nucleic 

^Kid* 

24. The method of claim 2X wherein the chang<^ in expression U 
displayed as a graph. 

2J* The method of claim 23, further compn^sing the si^p of 
gen(?rating the traseline expression level aCiCording to th^ iuputtiog and comparing 
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steps ol claim 23. 

26. The nietliod of claim 23, further comprising the jitpp of 
comparing hybridization iutemihe$ of perfect match attd mi&match probes 
hybridizing with the sampk nucleic acid sequence and hybridizdhon inteiisities of 
perfect m^^tch aud mismatch probes hyWidizing Kith a tKiseliAe ^equettce biy a 
difference Uireshold. 

27» The method of claim 23, further coinpnsir>g the «tep cjf 
CQmp^ring hybiidization inleriBilies of perfect match i)od mismaLh probes 
hybasJUzin^ with Ihe sample nucleic acid sequence and hybndi»siho« mt^^itiea of 
perfect match and ]ni:%match profae$ hybLidizing with a baseline sequence to 9 ratio 
threshold, 

2^. The method o^ cintim 23, Jfurtlier comprising the step utilizing 
4^ decision matrix to d.eternnu« change in expression the? gene in tlie sample 
nucleic aciiL 

29, The method of daim 23, wherein ihe change ia expression of 
ttiG g«ne m th© eainple nucleic acid is selected from the group conswbng of 
incre&^t^ margmai increase^ dccjrea;5ed, margmal decrease^/ and no change. 

30. In a computer system/ a method of monitoiing change in 
«»pres6ion of a gene in a sample nucleic acid sequence, Uie mt^hod cx)fn prising the 
stsps of: 

inputting a plurality gf hybridization intensities pairs of jieifect 
niaich and mismatch probes^ the perfect match prober being peifectly 
complementary to the gene and the mismc^lch probes having at least one base 
mismatch with the gene, and the hybtidi^aUon lnUfn«ilbl(iui indicating hybridization 
infioity between thi» periect match ^nd mismatch probes and the mniple tiucleic 
acid sequence; 

comparing th« hybridization ihtensjdes of each pair of perfec^t mi^ich 
prober in order to genera a get)C expression level of the sample nucleic acid 
sequeiKieT and 

determiiung a change in expression bj' comparing the gene exprefision 
level to a basetine gef le expression level. 

31* The method of claim 30, further comprising the at(^p of 
g^norating the baseline expression lev^l according oo the inputting and comparing 
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steps of dalm 30, 

32 The method oi claim 30, farther cQOiprisiog tin* step of 
comparing hybridiz^itioi) mten5iUe5 of perfect malch anJ mismatch prohes 
h)^bridi2jDg with th^ oampic nijciek add sequence aj\d h^bj idiz^Hun intensities of 
perfect match and uiisjiialcli probes ]lybr^di^mg with a baseline sequei>re b3 m 
difference threshuld- 

33, Tl^e method o{ claim 5D, further comprising Ui« step of 
compdrittg hybrldizadon int^^Rsiti^s of perfect match and mismatch prabe^ 
hybridizing with the sample nad4»ic acid sequence and hybddjzalion mbensilie^ oi 
perfect makh <ind mismatch pr&bes hybridizing with ti basdine sequence io a ratio 
Ihreslioid. 

34. The m<ithod of claim 30, further coBiprisinj; the step of utilizing 
d d^isioii matrix to detennine the chacige in cxprcs*joti of the gone in the sauiple 

nucleic acid* 

35- The nietiiod of claim 30^ whei^ein Ihe change in expj^siod of 
the gene In tlie sample nuchic add is selected fi^om t!^ group consistiog of 
incn^ased^ margina] incieas^r decreased^ marginal decrease, and no eha(tg<*. 

3 D&t^iied Descriptiojn of invention 

Th$ pres^ent invention relatefj to the fie>ld of computer ^ysteniA. Mote 
specific^iily, the present inv^nfeoo relates to computer systems ior andlyi^ing 
tioiogical sequences such as oudefc ^cid s^juences. 

Devices and computer systems for fnrming and \ismg ura^ya of 
materidld on a sul?istn)te are known. For example, fippJkatioii WO92/10S88, 
incorporated herein b>' reference for M purposes, describes techivique^ for 
sequencing or sequence checking nucleic acidd a«d other materials. Attnyn for 
performing these oper^itions may be formed m arrays according to the methods of, 
for exdmple> the pioneering techniques disclo^ hi \)>S. Patent No. 5,143,BS4 and 
U.S, Patent Applicatiofii Na 08/249,188, botli incorporated herein by referenc<> for 
all purposes. 

According to om aspect of th^ tm'hnlques ctescribod th^tein^ an array 
of nucleic acid prob^ is fabricated at known locab'ons on a $\ib$trale or chip. A 
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rluoresceuHy labeled nucleic a<:id is then brought into contact with tiie chip and a 
scanner generates an image Hie (which is processed inio a cell file) it^dicating Ihe 
locoHcrt^ vfhgsxe the Izibeled iiudek auds bound to the chip. 1^$(^d upon the <dJ 
(ile and idenHties of ihe probes at specific locations, il becomes possible io e>fhracl 
Information such as the manom^r sequence of DNA or RNA« Such fiy«tems have 
bfeen used to form, for <^ample, arrays of DNA that may be used io &tudy and 
debect iiiuULlon$ ralevant to cystic fibrosis, l:he P53 (relevant to ceti^in cancersjr 
HiV, and other genetic charactenVslics- 

Innovative computer-aided techniques for b^ calling Eire disclosed in 
US. Patent AppHc^^liott Nos. 08/531/137 (attorney docke* no. 16528X 008210), 
08/528,656 (attorney docket no, 16528X'O17600), and 08/618,834 (attorney docket na 
1^28X-fl 16400), wbicEi aie all heiifby incorporabed by reference for dU ptirpo^i^. 
Howerver/ improved cx)fnputpr systems and methods ^re still needed to ^VAixi^te. 
analyze, dud process the vast amount of informab'c^ now used ar^d ixtadf? available 
by titesie pitmeering technologies. 

AddiiionallVr there j» a n<^ed for improved coinputer-aideJ le^hni^u€& 
for moniturin^ gene expression. Many di^asp states are chaiacieazed by 
differences in the expression leveb of various genes either throtjgh changes in the 
copy nrnnlier of ll>€ g^etJc DNA or through ch^ing^ in levels of tran«<riplion {e.g., 
through coFaral of initiation, provision of RNA pre^cursors, RNA processing, etc.) of 
particular genes. For example, looses and gains genetic material pJay an 
imporiant role in jiK^^iHgnanl transformation *ux1 progrediion. Furthermore^ 
changes in the expresiiion (trajisciplicn) levels of particular gen&s {e.g., oncogciiets or 
lumw suppressors), serve as signp<^^t$ for the presence «>rKl prcgre*6ioii of v^arious 
cancers* 

Sirailaril}', control of the cell cycle and cell development as well as 
diseases, are characterized by the variations in tlie transcription levels of particular 
genes, thits, for example^ a viral infection is often ctuiracte-rized by the elevates! 
expression of genes of the particular viius. For example^ outbreaks of ilerp^ 
simp}ex, Epstem-Bari virus infections (e.gv infectious mononucleosis), 
cytomegalox'irns, Varu:ella*2»sLer virus infections, parvovirus Jnf^^ciioiis^ human 
papjltoniavoms infections, elc. are all chaj-^nrterized by elevated expresfljon of 
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various get^ed pr^ceri in ihe respective virus^ Doteciion of elevated e^cpression 
leveh of chamctemtic viral genes provides an ?ffecHve diagiivsiic of the disease 
siefe. In particukr, 7|]'US€5 such as herpes siinple>s^ enter quiesoeni: states for 
periods of time only lo erupt in brief periods of rapid replication. Detection of 
expression levds of char^cterislic vircd genes »l]ow$ debectian of such iictrve 
proliferative (ft!>d pre.>iumably infectiire) statcc, 

SUMMARY OF THE INVENTION 

The pre:>ent invention provider innovrtlive systems and methods for 
^aalyzing biological sequence? sudi a« nuciek dcid sequencer. The «:uxnp«iS^r 
system may analyze hybridization intensities indkating hybridization atitnity 
between nucleic ^id probes and a sample nucleic acid sequence in orxjer to call 
lKk9e$ in the sample sequence. Multiple ba^e calls may be cnmhined to form 
single b£i9e call. AddiUonAlly^ the compuler system may analyze hybrld)7jafion 
intensities in order to monitor gene expression or (he change hx gene expression gi? 
compared to a baseline. 

According lo one aspEfCt of tl»e inventfoa ^ C0l»pwte^•<mplen»^&nf©t1 
method of calling an tmknown bas<* in a sample .uiicleic acid sequence} compHse^ ibe 
steps of; receiving hybridization intensities for a plurality of of nuclejc dcid 
prober, «ach hybridi2:ation intensity indicating a Itybddization afftnily betwreen a 
nucleic acid probe dnd the sample nucleic add seqtjence; computing a base call for 
the unknown base for each $e( of probes; and computing o single lw»se call for the 
plurality of sets of ptobes according to tlx: fcwise crtll for die unknown ba.se which 
occuis most often for the pluralHy off^eiA of probes. Typically, the single ha^ call 
is displayed on a 9CTe(.*n display and a user is afforded theoppoituniiy to display or 
not dbpUy the base cases from which tlie sin^e base call is derived. 

According (loanother aspect of the invention, a method of dynamtcaliy 
changing paramelsers for a computet^imptemented base calling procedure comprises 
tlx? steps of: generating base calk for at least a portion of a sample nuclek acid 
sequence utilizing the bqse calUivg procedure^ the bdse calling procedure including a 
param«Her that is changt^able by a user; dUplayiJig the biise calb for the al l«»a«t a 
portion of a sample nucleic acid seqti€;nce; displaying the parameter of the base 
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calling procedure; receiving input from the uaer isp^jfying a new valiue for the 
parameter the base calJing procedure; generating updatexl base calls for ttie at 
least a p^iiion of a sample imdek acid sequence utilizing the l>ase calling procedure 
iiMd the t¥gw value* Ibr the parameter; and displaying the updated base calk for the 
al; least a portion of a .sample micleic acid s^uence. Typically the usc^r-cKangeabio 
parameter is acansEant^ threshold, or range. 

According lo anolher aspect of tl)e invention, a computer 
implemented meUiod of momtoitng expression of a gene in a sample rkiiclei^* arid 
sequence comprises Ihe 5$tep$ of; inputting a pluraUh' of hybridizaUou jnh^nsitie.s of 
pairs of perfect malch and mismatch probea, the perfect niat<;:h probea being 
perfectly coinplemenfcaiy lo the gemtt and the mlsmakh prober having at least one 
base mismatch ^vitb the gene, and the hybariJizalion intensities indicating 
hybridizalion infinity between ttie per^t ma£ch and nii£maLt!h probes and the 
sample nucleic acid i^equence; comparing the hybridization int^^i^iU^ of each pair 
of perfect mntch probss; ajid gener^aling a gene expression call of the sample nucleic 
acid sequence. In preferred embodiments^ Ihe expre^ision call is denoted as 
axprassed« nciargjoab or absent 

According to anoih^ aspect of th** invention, a comp«te^^ 
implem«t)b».d m&thfic^ of monitoring change in <?x:presdion of a g«ne in a sample 
nucleic ac/d sequence compris^^ the steps of. inputting a pluralit}' of hybridization 
inleiutf Ueo of paira of perfect makh and mismatch probes, the perfert maich probijs 
being p^rfectiy ccmpJementary to the gene And the mismatch probes having at least 
one b«3e mismatch wilh the gene^ and the hybridization Lnbensitics Jjidkating 
hybridization inanity between the perifcxt match and mismatch prober and kike 
sample nucleic <\cid sequence; comparing the hybridization intensities of each pair 
of perfect tnatch probes in order to generate a gene expression Urvc] of the sample 
nucleic ac{d sequence; and dei:ermini!>g a change in expression by comparing the 
gene expression level to a baseline gene expnasslon level. The change in expression 
may be displayed as a gjraph on the display scnaen. 

A further underslandlng of the nature and adva-ntages of the 
inventions hei'ein may be realized by reference to Ihe remaining portions of the 
specification and the attached drawings. 
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DE9CRIFriON PREFERRED EMBODIMENTS 

The prpsemt jnyention provides innovative methods of idisnUfymg 
nucleotide base caUing) in s^^onple micldc i»dd sequences and monltoi jng geiie 
expri^sion. Jn the descrlptinu that follows, th^ invention wilt be des^rribod in 
reiereiice lo ptcff^^rred embodiments. Howavor, the* d^s^xipHon U provided for 
purposeis of illustr.iLion n^nd n\A for limitiiig thespiiit and scop*^ of the iavenlion. 

Fig. 1 illushreites an e^t^mple of a comptil«?r system ttiat may be used in 
execute sofhva?** embodimenks of the pi^enf^ mvention. Fig. 1 show^ « compute 
system 1 which includes a cnonitor 3, srreen 5, cflibm£t 7, keyboard 9, ^nd mouse 11. 

Mouse 11 may havf* on« or more bullons such mctuse btittoiis i3. Cablnel 7 
houses «t CD-ROM drive 15 smd a hard diive (not shown) that may be utthzed to 
store and retrieve fiofrwaie programs including computer cade incorporating 8ie 
pxeseat 4nvt»ntion. /^though a CD-ROM 17 is shown as the computer readable 
medliim, other <:ompu4?er re^idaWe n^ia inckiding floppy disks^ DRAM, hard 
driv<?fi^ Hash memory, t-ape, and th& iJkc m^^y be uhjjzed. Cabinet 7 a ho houses 
faniiJiar computer components (not shown) such as a processor, memory, and the 

Fig. 2 shows a systeiii block diA^ram of computer .'S3^$tem 1 used to 
ex^ufc? software emljodimenls of the pre$cnt invention. As in Fig. 1, computer 
sysU>m 1 includes tnonifor 3 and key^board 9. Computer system 1 further iucfudeo 
subsystema such $s « cjentral processor 50, $y5fem msmory 52, I/O conlroHer 34, 
display adapter 56, removable didk 58, fix^ disk 60, network inferfare 6Z and 
speaker 64, Removable di*k 58 b representative of remoratble computer readable 
media like floppies, tape, CD-ROM, removable hard drive^ fkish inemocy^ ^nd the 
like. Fixed dijk ^0 j» representative of an internal hard drive or llie like. Oth**r 
cotnptiter sy&lems suitable foi' use with the pr^^s^nt invention may include 
additioaa) or fewer subsystems. For example, another computer dyolem could 
include more than one processor 50 (i.^., a mulU-piocesesor system) or memory 

Arrows such as 66 represent the system bu$ sirchltecture of computer 



(11) 



AV^tejixi L However, these airows are illush-ativ^ of any interconnecttoo 6clTeme 
sej^ng to link the subsystenii- For example^ display adoptpr 56 may be* comi^cled 
to central pitTct^oi- ItO khroii^i ai local bus or U)e yyatem may include o memory 
<dche, Ccijnputer system 1 shown in Fig. 2 is but an examplt? of a comptif^r sj&tcm 
5U)tRble far use with fhe pnesenl inveriitlon. Other canfigut^Uc^ns ol subsystems 
suitable for use with tlie present invenHon will be readily appareut lo one of 
ordinary skill in ttie art. In one emboduxient the compttter system is a ivcrksiation 
from Bun Microsystems. 

'li\e VLSfPS^" lechiioloi^y provides methods of making very lajtge 
arr^iys oi oligonucIeoHde ptobes oft very small chips. Stf# US. Palent 5,143^854 
and per patent puWication Nos, WO 90/15070 and 92/1009Z each of wWch is 
hereby incorponited by reference lor all p«rp05ej» The? oligonucleotide probes oii 
tj)e chip are used to detisci complementary nucleic acid sequences in s sample 
xiitvleic add of interest (the * target' nucleic acid). 

The present Invenlion provider melhods of aiwlyzdng hybridization 
Inteiisit}- files for a chip containing hybrid!Z(?d nucleic acid probc^ft. jn q 
representative embodiment^ the flle5 represent fluorescence data frojii ci biological 
array, but the files may also r^pt^wnfe other data sicch as rAdjoachve intereii)* data. 
Therefore, the preseiit uivenliw is not Jimited to analyzing fluorescent 
tt>ei)su2^m&nis ct{ hybridizations but m^y be readily uliUzed to analyze other 
measurements of hytH-ldiz^tion. 

For purposes of ilii^tr^tion/ the present inv<^^(ion ia described 
b<^ing part oi a compiler systotn that de$ign5 a chip maak, synthes^zco tlie probe* on 
the chip^ labels (he nucleic acid^^ and scans ilie hybridized nucleiL acid probes. 
Such a system ia fuJly described in VS. Patent ApplkaKon No. 08/249.188 which is 
hereby incorporated by reference for all puiposefi. However, the ptef^nt invention 
may be used separately from Itie overall system for aralyzing data genei'ated by 
such systems, such as at remote locations. 

Fig* 3 illustraies a computerized system for forming and analyzing 
arrays of biological materials such as RNA or DNA, A computer 100 is used to 
de&ig)n airays olf biological polymers such as RNA or DNA The computei li)9 mar 
be^ for ex<unple^ an appropriately programmed IBM ])eFsonal computer compatibie 
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running Windows NT indading appropriabe memory and a CPU as shown in Figs. 
1 and Z Tli^ conipul^er ^y^tem 1€0 obtains inputs from a ii$ei' ivegaxxiing 
chetnictedstics of « gene of i/iterest and other Inputs regarding tho d^sir^d features 
oF Ihe aaay. Optionally, the computer i^ystem may obtain informnitioa regarding a 
specific genetic sequence of intotx^&t firom an ex&?rnd1 or itilernal database 102 swlx 
as G^nBank. The output oi the computer system 100 is a set of chip design 
computer files 104 in ih** farm of^ for example^ a swiich m^ttiSf as described m PCT 
application WO 92/11)092^ and otfier assocwled computpr files* 

TtiG rhip design fi)e> are provided Lo a *yE^lem 106 fliat designs the 
lithographic meiMio U3ed in the fabrication of airays of molecules such cis> DNA. 
The system or process 106 may include the hardware i^ecessary to manufacture 
masiks 110 and also the necessary mmfml^r hard war? and softivam 108 rtec&s^^ty lo 
lay the mask patleim out on the mask in an effiriFiit maimer. A$ wjth the other 
fealtoes in Fig. 3, siicrh eqwipment may or may not be located at the same physical 
site, but is sliovvi? together foi ease of illustration in Fig» 3. The sysletn 106 
geiterates niaslks 110 or otfte»r synthesis pattom$ iuch as chrome-on-glass masks for 
use ia the fabricdUon of polymer array?;- 

The masks 110, well as :«lecled onformation relating to the design oi 
the chlp^ from system 100, a »^ tised in a synthesis system 112 Synthesis system 
112 Includes the necessary hardware and software used lo fabricate arrays of 
polymers w a sub&triile or chx]> 114* For e?<afnple, synthesizer 112 indudes a light 
source 116 and a chemkal flow cell 118 an which the substrate or chip 114 is j^lar.ed. 

Mask no is placed between tlw light source and the sub»lr<«te/chjp, diul the I wo 
ate laranslated relative to each other at appropriate times for deproiecti^n of selected 
regions of Ihe <hip. Selected chemical regents axe directed through flow ce^l MS 
for coupling ta depjx>lecl^l regions, as well as for wa&hing and other operations. 
All operah'ons am prc^feraMy ditecrfed by an appropriately programmed computer 
11% which muy or may not be the same computer as the c-ompiiter(s) used in mask 
desigxt arwl mask making. 

The subtttrates fabricated by synthesis^ system 112 mxi optiunally diced 
into smaller chips and exposed to marked targets. The tEirgets may or m;iy not be 
complem^tbikr)^ to one or mere of tlie molecules on the substrate. Tlie targets aut^ 
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marked wllh a label such as a fluorcscem label (mdicafed by an asterisk in Fi4.3) 
t^tki placed m scanniug; system 120. Scanning system 12D again operates und^r the 
diftiction of an appropriately programmed digital computer 122, which aUo m^iy or 
m<iy not be the sam^ computer as the CDinputsrs used in synl}u«^j$^ mat;k makings 
and mask design. Th^' scannier 120 include a detectioii dcxice 124 such as a confocal 
microscope or CCD (charge-coupled device) thai is used to detect tSi« location 
where labeled target [*) ha* bound to the suhstratft. The output of scanner 120 i» an 
imag« fij(e($} 124 indicating, m the cas^of fluorcscdti labeled target Ih^ fluoiescence 
intensity (phoLon counts or oihitr reUted measurements, such as voltage) as a 
function of position on the 6ub«trAte. Since higher photon counis will bs ohsFrvEvi 
where the Inbekd tarB*?t has bound mon-e stix^ngly to the array of polymers, and 
since the monomer s(>quence of the polymers on the $ab$trah3 is known as a 
lunction of j[X)sitigit it becomes possibte to determine the seq|ui9!nc«($) oi polym«t^«) 
on th« &ubstrate that are compiementair)' to the targ<?t. 

The image file 124 jis proviided as input to ^^n aiiaty&is system 126 th^t 
incorporates the visual iziitjon otid analysis methods of the present invent] on. 
Again^ the analysis system may be atiy one of a wide variety cf computer $ystein{$)« 
Th^ present inventioJD provides various methods of ^nalyziiig Ihe chip design files 
aubd the imatge file-fi^, pcovidijig appropriate output 128, The present inventton may 
further beuse<l U> identify specific mtiUtions in a target such as DNA oj RNA, 

Fig, 4 provideft a siniplifietl iHltielirittioo of the cverall Aoftvyare syslem 
used in the operation of ome embodiment of the invention. M sliown { i; Fig. 4, the 
system first identifies the genetic 8equence(s) or targeb that would be oi Interest in a 
parNcular analysis al 8tep 202, The sequences of interest may, for example^ be 
normal or mutant portions of a gene, genes that identtfS' hcreditjv or provide 
forensic information. Sequence selection may be provid^id via manual input of text 
files or may he fromi external sources Aiich as GenBank. At sb>p 204 the systen? 
evaluates the gene Ui determine or Assist the user jii determining which prober 
would be desirable on the chip, and provides an appropriate "layout" on the chip for 
the probes* 

The chip a^ually includes probes that are complementary to a 
rsfereiKe nucleic acid sequence which has a known sequence, A wild-typ^ probe 
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is a probe tiiat vvill ideally hybridize with the rcCc^renoe sequence and thus a wild- 
type gene (a[so calleij the <hip ^vild-type) >vould kieAlly hybridize v/ith wiJd-type 
prober on the chip* The tat^get sequence is substantially sfmllar to tJKe r^ferencp 
$eqtienco except- for the presence cf mulabions^ inst^rtJons, deletions^ and the like. 
The layout jimpiemenis desired characteristics mch as arrangement on tlie chip that 
permijis "readuig" of genetic sequaiice aiid/or rniidmization of <^if<x;ts, eas& af 
synthesis^ and the like. 

Fig. 5 llluslicileti the global layout of a chip» Chft> 114 is composed of 
multiple unite wheiTe eadi unit may contain different tilings for IJie wild-type 
sequence or mutfiple wild-type sequences. Unit 1 is sIkjwh m greater detail and 
shows that eadi unft is romposed of multiple c«Ils which are areas on tiie chip that 
may coEitain probes. Conceptuatly, each unit includes multiple $ets of related cells. 
As u$ed herein^ the teiim <:eil refets^ \q a rcgion on a substn^te that contains many 
copies of a mdiecuie or moleciilG$ <<?^., nucleic acid probes). 

Each unit is composed of u^ultiple cells tliat may bfa placed in row's (or 
"iaaes") ar>d columns. Cn one? embodiment/ a set of five related c«lls includes Itie 
following: a iv!Jd-t>^pe cell 220, ''mutaHon^' cdls 222, and a 'blank" cell 224. C3ell 
220 contains a wild-tj pe probe that is the compleiixent of a porblon of tfie wild-type 
sequence. Cells 222 contain "matation" prober for the wild-t^^pe sequence. For 
exanipk, if the wild-type probe is 3'-ACGT, Ihe probet> 3'-ACAT, y-ACCT, 3*-ACGT, 
and 3*-ACTT may be the "mutatiott^' prob(>ft. Cell 224 the "blank" cell because i( 
Contains no probes (also called the "l?iaftk" probe). A& the blank cell contains no 
probes, labeled target* should not bind to the chip in ibis area. Thust, tf blank cell 
provides an aorea that can be used to met^ure the background intensity. 

Again referring to Fig, 4, at step 206 the masks for the synthesis are 
deaigiied. M step 20$ the software uhlixes ihf> mask design and layout 
mformation to make the DNA or other polymer chips* This soflware 208 wili 
control, among oilier Uiings. relative translation of a substrate and Ihe niask^ the 
flow of desired reagents through a flow cell, the synthesis temperat«i)ie of the flow 
cell, and othei' parameters. At step 210, another piece of soft^vate is used in 
scanning a chip thus syntliesized and exposed to a labeled target The softwares 
controls the scaiming of tl\e chip, and sloies the data thus obtained an £t file that may 
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Uler be uUIizckI tu extracl sequence infarmatiofu 

At $lep 212 a computer sysbom utilizes Ihe layout inforiiiation and the 
flxiofesceiii* inlorinfttiiOA tu evaluate the ttybridized nucleic acid proliFji; nn rhe chip, 
Among the? rmportint pieci^s of information obkaineil from ONA chips are the 
ideoHitcation of mutant tirg^ls and d(±temiinatjon of g<>iv»tic secjuejice of a 
particulcir target 

Fig. 6 iUustratas th«? binding of a particular target DNA to an amy of 
DNA probes 114, As shown in IhLs simple example^ the following probes are 
formed in Ihe Array (only one prob© is showa for the wiid-lype probe}: 

3-AGAACGT 
AGACCGT 
AGAGCGT 
AGATQGT 



As $howa the sel: of probes differ by only one base, a single base mtetttdtch at an 
interrogation position, so the probes nt^ designed to deterniiiie tlie identity of the 
btis^r at that location in the nucleic add sequence. Accordingly, w]^^ used hei^n a 
unit will refer io multiple ^ts of related probes, whore each wt incltides probes that 
differ by a sjEigle base mismatch at an inlerrogatitm position. 

When a fluorescein-labeled (or olher marked) target with the sequence 
S'-TCITGCA b exposed to the array, it is complementary only to th<? probe 3'- 
AGAACGT, and iluorescem wOl be primarily found on Ih^ sm-fAce ot the chip 
whei? 3 -AGAAGGT is located. Thus, for each set oj^ probes that differ by only one 
baeo, tfa0 image file will contain foar fluorescence inlien&jties^ one for each prut7& 
Bacli 11uoresc&it«:«f InteRsity can then&foi« bpi associated with thd mtcl^totide or base 
of each prabe tjiat 19 different from tivt other probes. Additionally, the Image file 
wia contain a "blank" ceil which can be used as the fluorescence iniJensfty of the 
background, Uy analyzing the five fluorescence inlen^ilies associated %>rtth a 
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specific baselocatioiv it becomes possible to extract sequence Cnformation from smch 
zirrays using the melliods of the invenhon disclosed herein 

Pig. 7 illuslrafceti probes arranged in lane* on a chip. A referance 
sequence (or chip wilJd-tj'pe sequetK^) is shown ivith five fntt^rrogation po3itions 
marked wilh number subscripts. An intenrogatton position is ofteTiHme? a base 
posftion in Oie t«fer?iice soqitence where thd target sequeiKe may contacu « 
mutation or oth«>nvl$e differ from the: refererKe sequejice. The chip niay conram 
five probe cells ihtxt correspond io each interrogfttion position* Each probe cd! 
conlains a set of probes that have a common base al the iiifc^togation position. For 
example, the first fnterrogation |>osjtion, Ij^ Ihe relet^nce cscxjuence has a base T- 
The wild-type probe for lim interrogation position fc y-TGAC where the base A in 
the probe i$ complementary to the base at the hiberrogatioci po$Ition in the reference 

Sxinilarl;>v there are four ^'mutant* probe cells for the first mterrog^iion 
posiUon, li. The foui" mutant probes are 3'-TGAC 3'-TGCX:, 3'-TGGC, ^nd :V-TGTC. 
Kaf h of (he foujr mutant probi^s vary by d sin^yle base at tlie mterrogatir>n pociition. 
As shown, the wild-type^ and mu?ant probee ^ir& Arranged, in lanes <jn the chip. 
Otie of the mutant probes (in this ca$fc> 3 -TGAC> is idetitical to the wild^lype probe 
iind therefore does not evidence a mutation. However^, the iedundanc)r gives a 
visua] indication of mutations as wQl be 9een m Fig. 8. 

Sliil reft^mn^ to Fig. 7, the chip cont<»io3 wild-type and mutant probes 
for each of the ott»er ji\ber rogation positions In ^ach case, the wild-t>'pe pi-obe 
is equivalejit lo one of the mutant prober. 

Fig- S illuslxates a hybridization pattern of a target on a chip with a 
reference sequence as tn Fig, 7. The reference sequeivie is sliown dilung Ihe top of 
the chip for comparison. The chip includes a WT-lane (wild-type), an A-lane, a C- 
lane, a G^lane, and a T-lane (or U). Bach lane Is a row of cells containing prober. 
The celfe in the WT-I^ne contain pix^bes that aje complementary to the r^pr^irp 
sequence. The cells in the and T4anea contain pjtit^ that are 

complementary to the reference sequence excepl that tlie named ba*^ is at the 
interjrogatiOH position. 

bi one embodiniejnt Ihe hybndizaoon d prolies in a cell is determined 
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by fhe fJuoresccnt mtexisit}' (eg., photon couobs) of die cell resulting frotn ihe 
binding oi marked larg^i $iM)U«ii«s. The duor^scc^ni intensity may vary- gx^^sitly 
among tells. For slmplidly. Pig, 6 shows a high degr«« of hybridizaKon by a cell 
contaimng a dark<?mM are«i. The WT.Ian<* allows a simpk \%ual indjcalioE> that 
there is a ttmtiitioii at iiiterfcg^tion position becaus«> the wild-type c<U is net d^ri; 
at that position. Tlie cell in lh» C-Mne is darkened which indic<\t^ thdt th« 
muta-ioii is from T">G (mutant probe oeiia are c^ompientenlary $q the C-cell 
iiidicares ^ G muiaiion). In a prdt?rred embodiment the WT-liine is not utih'zed 
so four cells (uot incltiding nny ''t4ttnk^' cell) are utilized to call a base at an 
Interrogation posiboi*.. 

In {>raclict^, the fluorescent jntei\sifies of €3ells near an interrogation 
position having a mutation are relatively dark Cireating "dark regions" around a 
mutation. The lower fluoiescent inteii^ities resuU because ttie cells at jntem>gat<ori 
positions near a mutation do not conhiin probes that a)* perfectly complementary Co 
the target sequence; thus, the hybridization of these probe? with the target sequaice 
is lower For example, the ix?lative intensiiy vi the cells at interrogation positions 
and L, may be relatively low becaiise none of ;he probes tiierein aix^i:o0)pbmeiitar>' 
to ttie targ<>t s«<{uei\ce. Although the lov^er fluorescent mieoiities redtKV Ihe 
tesolution of the data, the tuelhod« of Ihe present invention provide highly accurate 
ba&e calling witliin Hie d^rk regions around a mutation and are able to identify 
other mutations within these regions. 

Fig. 9 illu6tnsites ^^Jandard and alLermte tilings on a chip. As sho^n, 
the chip includes tvseh-e uuita (urul^un). Unitsi.^ ane Hied (Le,, designed and 
synthesized on the chip) to include probes complementary lo the sami? reference 
sequence, lor identjftcation purpose*, ihis group of units wU] be called the 
sl^ndard group. In geneial, b^se calls for the target sequence will be performed 
utilizing the sloiiddrd group unless the invention dpternifnes that another group or 
groups should be utilized. 

Uni^^ die tiled to include probes couiplementary tx> the ,same 
reference sequencey but a refisrence sequence that differs from the reference 
s«?quence for the standard group« This group of units be called an allernate 
group. UnilRiny comprise* anollier alternate group that are based on a reference 
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sequence lhat is difforcnl from fte rdference se^inences of the standard and first 
alternate gii-oups. Aithoitgh tUe reference sequences are deferent, they are often 
quite similar. For example^ the ref^ence sequences may be sligt^tly differeni 
mutations of HIV. Eaibodiments of the present invention evaluate and utilize 
Inforjtniktlon from tilijkgs based on te^rence sequences that would tj'pically uot be 
u$ed. in base calLing tbe Ca]^se(|aenoe. 

The unite within a group may include identical j>robe$, probes of 
different situctiire, pix?bes from ttie same or <ljfferent chlp^, and the \ike. For 
example, one unit fiiAy include probes with tf»e interrogation poi^itioo at the 
third po«ifi'.ni in prober. Another unit may include lO-mer proi>e5 with an 
iiitertx>gat]on position at the si^ih po«ition. Additionally, the$e unit^ may have 
boon tiled on the same or different chips. 

The expanded section at the botfom left portion of Fig. 9 illustr<iles 
that each block of a unit typiCJiJIy includes four ceiK denoted C, G; and T. The 
bas^ de^ignanons specify which ba^e i« at the Inter jrogation position of ertch probe 
witiun the coll. Typicall>v there are hundreds or thousands of identical nuclek 
probes within each cell. 

Although in preferred eiDbodiinerkts the cejle may be arranged 
adjacejit to eacli othdi* in ^f^qtu^ntial order along the refdretbce sequence, there is no 
requfrerrftent th^t khc? celb be in s^ny porticular locattoti <is Jong a& tiie iocabon on 
chip o» del«riirtinaWe. Additionally, although it me^y be beneficial to synthesize tlte 
d liferent gJ^wps ^ single chip for consisiency of experiments, the methods of the 
present invention may be advantageously utilized witli data from different tilings 
on different chips. 



Fig. 10 sho^x's a screen display of hybridization intensities fK>m a chip. 
During anal^'sis^ the .systFju nf^ceives dn image file including the scanned im^ige of 
the hybiidizaEd t:hip. In a preferred embodiment, the image file shows fluni^scent 
intensities d»d locations that labeled target lUHlelc actd sequences or iragmerUs 
bound to the chip. 

A srreen display 260 uttl]2e5 tiie common ^v'indoxv'ing graphicsij user 
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interface. The user may seJect to display the image file for inspection. After the 
user (selects Iti^ imngi^ Ble to be displayed, a window 262 is displayed th^t includes 
the image file. The imag« file shown includes multiple rows of Av G-,- and T- 
lanes. 

As Hie user move$ the cursor over Die dispUyeil image file^ a status 
bar 264 indicated the X ^nd Y position of Uie cursor ^nd fluonescetit inletisily «t 
thai poeitiop. Additiooialiy^ (he user is able io utiUzie the poinling device h> select a 
jn^ctangulai nre^ K*f the image file in order to manipulate the .sub*iniAge. For 
exampky the user oiav mtignify tl^ .^ubfluiage 5o Ih^l the ifidividual cells may be 
seen ntore clearly. Additionally/ tlie user luay adjust the contiast of the jnEexi«jlie6 
io bring to light soma dildmnKes in hybridization inteii&ily that }5 not apparent at 
Hye curi*ent contrast setting. 

Fig. 11 Is a flowchart of a proems of computing a base call from 
hybridization intensities of related probes. When used hercna "«3lat«i probes'' aj?© 
probes that differ by a nucleotide base at aii interrogation positjoa Although 
typically tht^ prober are identical except at ih^ interio^j^tion posilii>n/ the probes 
may differ at other txaae positions as welL Accorrimgiy, the related probes differ 
by at least one base. 

Atfilep 302 Ihe hybridization intensities of the four reli^led probes aie 
adjusted by sabtracting tiie backgroiind ox "Wank'^ <elJ intensity. Preferably, if » 
hybrid ration intensitv^ os then less than or equal to z«ro, the hybridization iniensit\' 
h set <e<jual to a small positive number to prevent division by Zipro of negafi^^e 
numbers in futuire calirulations. 

Av step 304/ the hybridizatiorx intensities are sorted hy intensity^ The 
bighesl intensity is thi>n compared to a pretlerer mined background difference CMioff 
at step 306* The background difference cutoff is a number that speroHf??* thi^ 
hybrtdization intJensHy the highest ii^tenstt}- probe must be over Ihe background 
immensity in order to correctly call the unknown base. Thus, ttie background 
adjusted base intensity must be greater than the ba<:kgrontbd difference cutoff or lh& 
unknown base is deemed to be not accurately caliablA. 

If ihe highest hybridization intensity of the related proi^es is not 
giBater than tte background dilf«i-encG cutoff, the imknown ba$e is av^i^igiied the 
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cade 'N' (insulficienfr intensily} ad 3hQWn nt ^tep 308. Otherwise, flie ratio of Ihe 
highesi hybridization mtensiLy and second highect hybiidizanon intensify is 
c^kulnted as shown al step 310. 

Ai step 312^ the ratio calculated at step 310 is compared to a 
predetermined ratio cutoff. The ratio cutoff is a number that 9pedfi(£& ttie ratio 
required to (dentify tlie unki^own base. l\\ preferred embodiments^ tlie ratk) mtof f 
if L2. If tlw ratio i$ gwater tlwn the ratio cuto^, the unknown Iwisc h called 
according to Ihe probe with the tiigh^t hybridization inteni^Uy. Typically, the base 
is called as the compilement of the twse at the iuterrogatioii position in the highwt 
inteflistty prt>be m shown at step 311 Otherwise, the ratio cf the second highest 
hyt^ridizatlon intensity and third highest hybrldizaNon intensity is calculalEd as 
sho^nat step 316, 

At $tep 318/ the ratio calculated at sfpp 316 is compred to tbi? ratio 
cutoff. If the ratio k- greatiifr ttian the ratio cutoff^ the tinlcnown ha$e is cMed as 
being an ambiguity cc»d<J «pi?cifying the cDuiplemeiits of int^frogation position bases 
ot the highest hybrid izatjon intensity probe and th& a^oixd higher t hybridizatiot* 
probe as shown at sl^p Otherwise^ the ratio of Iha third tu^hesl hybridize tior* 
intefusiiy and fourth liighcst hybridization int(2nsity ifi calciiJaced as shown at sii^ 
322. 

At siep 324, the ratio cdlculatsed sfep 322 is compared to the ratio 
cutoff. If the r^tio ify greater than the ratio culoft^ th*> i^nJknown base is CAillec) 
being an ambiguity ccide specifying Hie complements of interrogation position ba^^^ 
of the bluest second highest and thind liigiiest hybridiziihwn intensity probes as 
alwT\'n at step 326. Otherwise, tl:>e unknown base is assigned the code 'X' 
(cnsufficjent discrimination) a$ shown at step 32S. 

Tig. 12 <x flowchart of another process of compntiitg a base call from 
hybridization intensities of related pn>bes« The flowctiarl shown operates on 
tiybridization tnten.'iiti4><t demonstrated by related probi^&; thu9. a base call is made 
for Ihe base in the tai'^et corresponding to the interrogaWon |>osit)on in probes that 
differ l7y a siagle ba&e mistnatrh at the infcerrogatfon position. At step 402^ the 
system deternuites if there is one probe with the highest hybrjdizfktcon to the tai^et 
sequence. If then? is not the base is called as an 'N* jn«»kmn|j ambiguous. For 
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ex;»mpl«r if Kvo probes have the same highes^t miensily {U., is a tie), the base 
would be called as *N', 

If there Is a single praWth^itbas ths? highesl l^'brldizatiOA to (he target, 
tlie base is called wording to that probe af step 406. Since the probed are 
cocnpJevnentdiry io the target sequence* the base may be called as the completnentary 
base (C/G, A/T) tx) the base at the interrogation position of the prob(>. 

At step 408, the system det^eraimes if tJie base C4ll b ^ mutnat 
me«»Dmg it is different ilMjm the base m the reference sequence. IF the base is 
twt » mutant base nM, the base caJl has been made. Otherwise, the i^ysteni 
deterxnines checks to lu^ke 9ure certain "mutant'* conditions a<e met at st^p 410 or 
the base is ciilled as 'N* at step 412. 

Befor<3 describing the muiiint conditions for one embodjmeni il may 
be beneficial to give labels to Ihe hybridisation inhensiKe^ of the ralated probes. 
FoiT tUastrntion purposes "Highlnt" wjU refer So the highest hybddization intensity, 
•^econdint" will refer to the second Mglie&t hybridizdiiort inh^nsity, "Thirdint" wiU 
ref.er to "rh<? third highest hybridization intensity, and "lowint" wilJ refer to tllte 
Jowesl highest by br id izah'on intensity. 

In one embwiiment the mutant condlttons itidude three \:e^is> (hat 
must all be met to call the baw» a n^utanl, A ^i^st lest i& whethei* the different 
between. Hjghlntt and Socondlnt h gi^ler th^att a difference cutoff. Thus, ihe 
system deter tnines if Highint - decmdlnt i» g:re«tei* than a predefined value. This 
vaIuc should be cboseii to allow inulunt tiM5e cail& only when ttie highest 
hybridization intensity {$ greater than the iiext highest hybridization inferisity by a 
desired amount 

A second test ts whether a fir^t ratio is le^s than a h'rst r^lio ciiloff. 
The first ratio is liie following: 

Highint- sqrt(ThirdTnt* Lowlnt) 

The sysletn determmes \i this first r^lio less tfi^n a predefined value. Ttiis value 
$hou]d be clwsen to allow mutant base only when (he idghedt hybridization 
intensity i» a desired ratio greater than the nexl highest hybridization I'ntenj^ity even 
after the lowest two hybridization intensities are subtracted out 
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A third test ij whetlier a neighbor ratto is greater than a iieighbor ralio 
cuLoff. The neighbor mHo is ttke /oJlowing: 

Hii^hliit^ 

Highbit,,- sqrt(HighTft^.t * Highlnt^J 

where the s^ubscript n designate values for the bas^ position that is h^iiig i:allod and 
n-i-l and n-'l represent values for 4d}4C«nt base positions^ Thus, Uie system 
determines if the neighbor ratio is greater th«n a predefined valii^. This valae 
should be chosen to allow mutant base calls only v'hon th& highest hybridization 
inbeni9ity is a desired r^Ho greater than ilie lughe^l. hybHiliz^iiion intensitjF- widi thf^ 
adjilceni highest hybr IdiZiiNon intensities sublnKted out. 

AcLx>rdingly, in a preferred embodiment only if all of the muUn( 
conditions are met will the bfise be called a mutant base. Thi$ embodiinent 
recognizes that intjlation» ftjre fairly rare 3o o mnt^nt base should only be called 
when tliere is & IiJgh likelihood tliat tiiere has been a mutation. V the mutant 
conditions itre not rrn^, ttie bd$e may be called as etmbjguous or as tlie same as the 
refiiiwce sequence [which statbtically in«y be the coriiect base call). 

Although a preferred embodiment utilizes three mutant condiHons^ 
other embodiBrtents may use a single mutant condition (e,g., one of die conditions 
d«cfib«?d above). OUier embodiments may utilize other base calUng methods 
Induding the one$ described in the US. Patent Apphcatlons previotisly 
incx3rporated by reference. 

Fig. 13 h a flowcharl of a process of calling bases in a gtoup c*^ unite. 
As indicated e^wlieo ^. unil includes multiple sets ol* re^Jatcti ceJJs^ where the reUted 
cells include probes that differ by a single base tit interrogalion position^ In a 
t^'pical embodiin<*nt tl^ system in! tidily wceives input on tlv2 hybrid iza Hon 
intensities [e.^.^ from the hnage data file produced by a scanrier ih^i scaxis the 
hybridized rhip) and the slructwA^ of the probes i;grr^pond to the 
hybridization in1ensUi4i>.s. In preferred enibodimente^ the background intensity 
icitensi^ meiasured from "blank" cells or other are^ of the chip WDthout prob^) are 
subtracted from the measured hybridization intenstlies. The background 
subtr^cfced hybridization intensiti«s may also be limih&d to have a minimum 
ity bridization ititansily of 1 {€,g,, one photon count). 
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The hybridizaKon intensity de^ribes the extent of hybridization that 
was ine^^ured between a probe (or ttiulfiple copies of a pinb^) and the target 
sequence. As Aft exampJe, the hybridization inft^asify may refer to the m^an cf the 
photon counts (ecord(Kl from sj celt the photon counts resulting from fluorescein 
tebeJed b^rget sequt^nces that bound to protx^s in tbe cell 

step 452, the system gets a bftse position in Ih^ target sequence to bp 
called. Thesysl:em then <X)mputes a base call for each unit of Ihe group dtshep ^4. 
Therefore, the hybridiziiHoo intensities for the relpfed cells of each unit af the base 
position aie aniil>'2:^d. With this dn»lysis (embodiments of which were described 
in more detad in leference to Fig*. 11 awd '12), t\^. system computes a base call for 
each uvit Thui, if there ftte five ttnit& in the group^ live base cflJls may be 
produced. 

The system anj^lyzes the bAS<* calls of the unib* of the g^oup at step 466 
in order to ccinpule ^ base call fca* tlie group, [n onf^ embcHiiment -system calls 
the base according to Jhe l>ase which i$ called most often by the units For example^ 
if iliere are five unife t*nd the follD^ving bds^ calk were made for each unit: 

T - thrsje units 

'G'- one unit 

'N' - one unit 

The ba$e will be called a T since thj<?e o«t of five units agree- Ti<?s may be broken 
b}' analyzing olher f£.ftor8 iike the highe*t average hybridization inbensitj- o( the 
unit Of units that call each baw in the fie. In <i preferi^ed embodiment ttte 
invention utilizes the process described ito Fig* 15> 

At step 458, it is determined wliekher tiiet^ is next bo^e posiiJon to 
analyze. The present Uiventio^i may t?e uHlijzed to call ^11 the bases of a target 
nucleic acid sequence so the pracess n^ay^ in effect, ''walk" througli tlte base 
positions* Additionally, the invention may be utilized to call only certain base 
posittoxis (Crg,, mutation positions) so process may sl^ip certain base positions 
altogether. 

Fig, 14 h a flowchart of a process of calling bases for tuutliple groups 
of units. A& show/L in Fig. 9, there may be multiple groups on one or more chips 
that ai« to t>e analyzed. The multiple groups may be tiled aoxux!ing t« diffei^nt 
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reference sequecKes; hov^iit^ei, IhiA does not mean Ihat all of Iheir hybridizaUcm 
inft>rmatjon may nol he uUliTed. lypicany. it is assumed that the reference 
sfiqitencx* for the standajtxJ group is expected io be ttw most identical io the target 
sequence However, if one of the alternate groups i$ determined lu be more 
identical {Le., better for making d base call)^ then lhat group wjU be xised t^ mate Ihe 
IxisecdIJ. 

Air step 502, the system computes base calls in the units of the 
standanl and alterjiale groups. The base catling may he done as was deft^ribed in 
reference to Fig* 13, 

The system then coiuput&s a Ixjse tall for each group of uaifs M $tep 
504, This may be accomplished by deteimhung the base ^hal is called most often 
by the unite. Alternatively^ the basse call far the group may be defermired utilizing 
the prot:e5ii whkh vill be desciibeil jn mvie detail in reference fco Fig. 15. 

After the system has determined a bdfie call for edch group of units 
(bollh the alsndard aiid akernale tiling;&X (he system identifier a basis position i^l step 
506, The system tbei^ determines Ihe best group of unil» for this base position to be 
ulflized to mal^ the base c^ll. In gen^iral, sellectiug the besl group may involve 
detenniru'n^ which reference scqjuence of the groups has the ferivest mismafche? 
with Lhe target secjuc^nce near or on ii window arouxul the interrogiJtion posiHon. 
The group of umts that Ivas fewest mi^malcheo near ihe interrogation position 
may have lhe ti^ghest likelihood of producing the most accurate base callL An 
einbodiment of selecting the best group will] be de^scribed in more detail m reference 
to Fig. 16. 

At step 310^ the system ceili^ the base at the idetitified base position 
according Ho the best grctip of units (i.e., qtjjjzing the Iwse call for the gcx>up that 
was computed at step 504), Ottce the ba*e call hm been made the sy»fe>m 
dcferfniuois if there is u next base position to perform a h^x call. If tliet e is another 
ba&e posiiion to be called^ the system proceeds to calf thai ba»? position at step 5D6. 

Fig* 15 i& a Qowchart of a process of calling a b$)se for a groutp at units. 
At sLep 602, the :^5tem determines if a inajorlty of units call the same base at riie 
specified base position. The majodiy is deteimined upot^ reference to only thos^ 
units ihatcall a base ie.g.y do not call as ambiguous or 'N'). ForexampH as^uinie 
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that ibere aw «evoii unite and U»e following base cails ha\/8 been for H^e unite: 
*G'- three unite 
T' one unit 
'JJP - fouruniis 

Since three out of four of the nonambiguous base caSk ar« 'G\ the sjrstein wiU 
iniUally call the base as a 'G' for thp group of umla. The bast* will be Ciilled eis the 
majority' base uafe^s au excizpUon jrulte Applied at step 604. 

The o?ception rules specif}' coitdiliotis wjuth dictate what base call 
should be made fox ihe group of units, Tb>se rule* aidiy include coitdiduTi? lhat 
change & mftjorih* base c^][ and in?iy include condiWoti* to dieal wjt!^ ^iluatl^ns ivhen 
therc> is not a base call tliat« mftjority of units call. In 9 preferred embodiment the 
exceptioR rule!5 i*i<ludo Ho breakfng rviieo which andyze tiv hybridizatioTu intensity 
of j^eigJiboHng probeii (e.g., one uiiil calls twie ba»« aj^d iiuotii^r uait call^ a diffcroni 
bAS&). Addltionaiiy^ the eKceplion .rules specify ih^t af riirv^ uni^ call diffeE^al 
bases vvilli onc^ t)f the calb being for the reference base, the sy&tefit should call tbe 
base as the refereiure i'or the group of iin/te. Other exception rules are described in 
Ifae Appendi?t 

At step 006, the system determines if an exreptloa rule applies. If an 
exceplion rule does apply, th« rule f& applied at step 6418. 

Fa^. 16 hi 4 flowchart of a process of selecling a best group o( unii^ for 
ptirfoxmtog a ba^ calL Selecting the best gN>Mp involves dptprmining which 
reference &<K[uenc« of ihe groups has the jfewest mi^makh^^s witk the target 
^iequence near the mtenogfltion position. The group of unib^ tliat has the fewesi 
mpsm Bitches neax tire inteiTogation position may have thp highest likelihood cti 
producing Ihe uiwt accurate ba«G call The window amurtd Ihe antf^rrogation 
position which is analyzed may be a set value or set according to the probe 
atructure. For example if ihe oiaKimujai diste^nce that the probes iov alD the groups 
extend from the interrogation po&itidib is eight base positions to one side o( the 
mtufETogation position and ben tiose positions to th» ottu^r ftid^ of tihe inierrogattoji 
position, th» window may be $et a$ induding this range of base po&ilbon». 

At step 702. the system cakuflate^ misniaSch scores for Ihe ^landcttd 
and aflternaie groups of uvtH. The itiismatch surore is an ifKiication of how many 
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mism^fcches a jnef9reuce sequence £ippear5 lo hf)Ve with the tar$ec deqaeixe, (n 
order io detennim^ a jiiisiiiaLch score/ the ^y^leni may only analyze btise position!; 
where at leosl two of the reference sequences differ. Thus, if all. the reference 
seqtjetKesi are identiciii iica base position, this ba$i> position may he 5kipp«»d^ 

At each bose position where at least two retV>r«nce scquertoes diflcfi-, 
ih« system deteiimnes if the base cail for a gsoup {the base caM indicaling the Ukely 
base in tiie target scqxtence) at: each of these positions differs from the corresponding 
base of tlie iiefet<nc<: sequence. If the base call and th« ba9f> for the reference 
^qvietKie differ, the tnismtitch scot« ifr incremented by one. Initially, the miamaich 
scores for each group ift ««t to aero* 

Conoeptually^ it should be und^i&tood that ttie mism^iilch scuin: an 
indication of the number of base positions in a p^^rtion of d^e i^feret^re sequence 
thai differ Uom the t£irget sequence (opiiomliy excluding tho^c; positions Tvhere all 
the ref^renc^ sequencei* are ilie same). To better illustrate this con^Tept tKe 
following simple e^iJimple is presented* Assume there 15 h standard group and two 
allemAte groups as fellows; 

Slandatd Group Mi^TTiflkh Score 

reference ACGGATiQAGATACGA 1 

base calls ACTG ATSAGATACGA 



reference ACTGATGAGATACGA 0 

ba«e calls ACTG ATfi AGATACG A 



Altprftdlfi fiT0iip2 Mismatch Sfflrft 
i^fereftce ACGGATGAGATACGT 2 

base calls ACTG ATQAGATACGA 

The underlined bases correspond to Hie base po^ilion wLikh is being analyzed. 
Ttie bold^ base positions Indicate basie positions where at least two of the reference 
sequences differ. At iSiese bold^ ha^ positions, the standard ^oup hafi one base 
position wtiere the reference sequence dif terc (rom the target sequence (as indicated 
by \he base calls) so the inisiinafrh ^ore i» 1- Sitnilady/ the first alternate group tias 
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A mismatch scojre of 0 and the second alterjnate group lias a mism^ilrh Kore of 2. 

As alternate group 1 Uie lowest mismabrh score^ that wcukl 
bet iUitized to call die base at the base posiUon being analysed. In this simple 
ifX^imple, the biise call is not different for any of fh« groups as tliis e>c£impl« is 
inti?jicl«d to illn^hciit^ how thu besi group jxvky be sekcidd. Howwer^ what is 
iinportEiiK is dtat ti>e inv'ention recognizes that the m^^rc mUn^atche^ thai occur near 
a base position^ ih« his$ accurate the base call will (>ecojiier This ty^uU is broi^ght 
upon by the fact that a mismalYih between Ute rererence sequence and the target 
sequence creates any atrea wheie ll'fce probes inierrogaling neighboring base 
positiona include a smglf? base mi»mabch. Single ba&e misDiufches lower (he 
hybridization intensity and may produce inaccurate resiilbs. 

At slep Itte system detenmines 11 a mismaicih ^ore of tlie standatd 
groups is l«s than or equal to the ratsniatch scorch fif allernate groupi). If the 
standard group ha$ the lowest mismalich scote (or {jes), then the base call performed 
according to the standard group. 

The s>'slem determines if ^ single alternate group has the lovr^ed 
mismatch score at step 7(JS. If so. that alternate growp is iihlized to make the base 
call at «tep 7 HI Othwwise^ there ar€^ motxs Hian one alternabe gtoiips Hiat have Ihie 
fiiime mUmatch score*;. If thi^ is the cas^^ the alternate group may be rJho&en which 
includes units that most consistently call^ the base at step 712. For example^ if 
two alternate groups have the same lowest itiisniatrh .score i>ul one group's units all 
called the ^me base a^id the other group's unite were splft the alternate group lhaf 
called the same btise would be tililiz}ed. Other oietliods of deterniinmg the besl 
group hi the «vcnl of ;i mistniibch score tie may abo be utilizjed. 

Fig, 17A shows a sae«n displays allowing analysis of nucleotides from 
e;(jperiments from one or mote chips* A screen display 802 includes muilipie 
screen ^r^as that display different information to the user. A screen area 804 
.includes the name of a reference sequence Avhich in this example 1$ PRT 440A which 
«re antisense regions (Protease Rev^H*se Transcriptase) of the HI\'' virus. The 
reference sequence is typically Tised as a baseline with wliich to compare sample 
sequem^es. Although th« refojrenoe »^ueiKe on the scr^n may b« the chip wild- 
type sequence for which tlie chips w«re tited^ there is no requirement that this is the 
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A screesi area 806 includes the niKJeotide sequence for the reference 
frequence for the probe ^rrdy« The base position of e£ich nucleotide is shown abtme 
9cc«en at'e^) Scn^ii area 606 also sohows the refej^ence 5iK|ueiict> fcr eairh unit if 
"expanded'^ in the user inl^rface. 

A ^cte&ti area 808 shows the user ti^ cbip and composite ft)e$ that are 
currently being analyzed. A chip file (e.g., *>nds in 'XHP) includes daio obtained 
from a sJc^k chip. A composite file {e.g., «nds in \CMF'} jnAlMile^ ddia obiairted 
(ram kiox* tlian one chip. When a user opens a chip or composite file for analysis, 
the pathname of the file is displayed in scteen area 808» 

Liformalicti froi»i the diip and compo^it? files may be displayed in 
screen at^eas 810 and 812. Screen anea 810 inciud«$ the names ot samplf^ sequences 
currently being aLi[aIy:osd from the chip or composite? files. The name of thi? sample 
sequeiice is typically chosen to enable the user to readily determine th<> what ttie 
sample sequence repi^esents. Screen area 812 includes the nucleotide sequence for 
the sample Sfiquence^i. The base positlc^n of Qach nticleofide in screeci ate^ is the 
same as indicated above screen area 806. Accoidingly, the ^yvtein automaticaliy 
aligns ttie refeivmceand sample sequences for easier ^naly-sis. 

Fig, 17A has been described in order to lamiliarize the leader wilh li^ 
layout of thp sctwrt display. However, as i11u$tyat^ by Fjg, 17B, the invenUon 
allows the user to hid*? {not display) and summarize information from chip and 
composite? For example, if a user "clicks on" or activates ti»e scf»?en icon plu$ 

sign in front of (he composite filename in scjeen area 808> the $yst^m di&plays inoie 
infonnatian about the t^omposite file. As shown, the method that was utilized to 
combine the information from the chip files may be ahown along with tlie 
individual cldp file$. 

Additioiiallyp if a ii$er activates the screen icon plus sign in front of the 
chip filename in screen area 806^ the sy$Um\ displays more Lnformaticn about the 
chip file inciudini^ the process or procedure that was utilized to calls bases. In Fig. 

the base calling proceduite was the "Kado Base Algorithm" which wa$ 
described in reference to Fig- 10. AddiUonally, the user is able to niodiiy 
parameters for the base calling procedure which will be imn'^edia tely refl^^ted in tho 
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base calk shown on H\e display screen* For cxainpJe, <he ralio cutoff ("Raijo") is 
di^plAyed as J 2. If a user increases liie raHo cutoff to 14, the sysfcejn would dien 
recakiJjdte the base calls for the chip and the lusw base calte would be reflected in 
^reen area 812. The pAramelers may be any values Uie dre input into tlie bajw 
taJlijig procedure in«?liiding comiants, (hr^fshotds, t*an^«»/ and the Uke. 

Fig. 17B aba illustrates thai Ihe system i$ aMe to combine data from 
3nfi]tlpl« experiments (including various tilings) for ©asief tending of flhe usei-. Jh^. 
saitnple sequence 44(y^2A was showu in Fig. 17A and lias bei^n ifxpanded m Fig. 176 
to show ilial; Ihe base ca)b aine derived, hvm multiple «xpejria)€nis^ where the data 
from multiple experimenb may be from ^ «ngle chip or multiple uhips, hi other 
wowis/ tlie nucleoltdi? 4<5qiie]DC« shoivn for sample soquence 440*2A in Figs. 17A and 
17B do not represf^nt a single experiment but actually a combination or consensus 
fium multipb eKperinienis. The user is able to review Iho data from each of the 
experimeitts as shown in Fig 17B which includes displaying fin? hyhridiz^ttlon 
intensities for each related l>a$e. Tlie system allows the user to highligbt a h«ise 
po.^ition ffit nnxily&it sis shounri /or bas^ position 100. 

Referring again to Fig. 17A, 0 9cn?^n icon plus sign is didpl^y^d in 
froot of the name of tlie sampl'^ sequeixce '^440-2 A." By activating the screen icon, 
the system dispiay&each of the individual calls lhat in<ikft up tli€ co*npotdte base ralL 
As ^;hfnv'll in Fig. 17B, th^ composite base call is derived from multiple base calls. 
The multiple has*? calls are aligned with the composite base call according to base 
position, Thtf invention provides gteat flexiblliti' to the uiser (m displaying/ hidings 
and «ummarizing data for analysing sequences. 

Monitortni; GeneEicpfe&^iQti 

Fig. 18 2)how5 a hi^h l^d flowdiart of a process of monitoring Ihe 
expression of a gene by comparing hybridization intieii^itles of pairs of perfect 
match and iniMnatcti probes. The term ^'perfect match probe" refers to a probe tliat 
has a 5c*quen(.e that is perfectly complementary to a pdrticuiar t^irget sequ^^nce. 
The iest prc7be h typically peiiectly complemenUkr)' to a portion {subsequence) of 
the target sequence. The term "mismakrh control" or "mismatch probe" ref^sx to 
probes whose seqi^ence is delilserately .elected not to be perfectly compkmentai y to 
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a parHcuJ«f tai^eft se<|uence. For each mismatch (MM) control in a high-densify 
ari'Ay tb^^re typically exisia a corresponding perfect raatch (PM) probe that is 
perfectly cotnplemenhiif to the same particular target sequence. 

The pnK:eBs cDmpctes hyt^ridizahon ititon^ifi«$ of pairs of perfect 
mafcrh and misrantch probes thi»t are preferably cov^ilently attached lo th$ surface of 
& substrate or chip. Most preferably, tbe and^ic acid probes t»av^ d density greater 
ih^u about 6D differenl nucleic add probes per 1 cm^ of the substrate. AJchough 
the flowcharts sliow ii sequem^e of steps (or clarfty, this is not an {ndication that ttie 
$tep» must be perfottiwd in thi» sp^acifk order. One of oixJiiiaiy skill in the art 
wo«ld readily recogxuze ihfifc many of fclie steps Jtnoy be reordered^ combinf^d, and 
delated without departing firooi the inwAtjon- 

InibHlly^ nucleic acid probes are selected that are complementary' Lo 
the Ui«:gel sequwace (or gene)« These prober an? tihe perf^t match prober. 
Another set of probes is specified tfiat are Intended to be not perfectly 
completnotktary lo the target sei^juence Th«se probe5 are the mtsmetch ^Hxibes and 
each mismatch probf? includes at Jeast on<? nucleotide nitsmaLch from a perfect 
m^lcb prabe. Accordingly, a mismatch probe and the perfect match probe from 
which it was dorsvM m^ke up a pair of probes. A» mentii^^ned earlier, the 
nucleotide mismatch is preferably near tfxe center of bhe mi&match proba 

The prob<z lengths of Ihe perfect match probes *tre h^pically chosen to 
exhibit hlf^h hybridixab'on ?iffinily witti the target sequenc^^ for example, the 
nwckic «cid probt-s may be all 20-mer$. Howewr, probes of varying lengths may 
aUy be »ynthe&izedl on the^ubstrat^ fc?ran> number of reasons liKluding resoCviikg 
ambiguities. 

Tl^e target sequence is t> pkaily fragmented, labeled and exposed to a 
subftbate including the nucleic acid probes as described earlier. Th«^ hybridization 
intensities of the nucleic add prot>es is then measured and input Into a rnmputet 
9y$tcro, The compuiev system may be the same sysh^ that dii^cls the sul^tral? 
hyt^rtdization or St may be a diffor^mt system aJto^e»her. Of course^ any computer 
system for u$e with the invention should have availabJo other details ot^ the 
experiment including possibly the gene name, geno sequence/ probe sequence.?, 
pxobe locations on the substrain, and the like 
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Referring to Fig. 18, after byhndiZAtxoa the computer system receives 
mput of hyH'idizatioLi iiiLeasifjes vs the multipl^^ pairs, of perfect miitch and 
mtsmaklj probes at step 902- The hybridization jatensititss mdkate hybridization 
affinity betwci^ti thf nucLeic acid probes and the tcvget nucleic nctd ^which 
correspcHids to a gerup). Each pair incJudifte a p«?rfect ru^ilch pwbe that is perfectly 
coxaplwnenfcarv to a portion of the larg&t nucleic acid and a mismatch probe that 
djffet:$ firvxn the perfect maftch pit^beb}' at least one nucleolide. 

At «tep 904, the compulei' systeiti comp4re» the hybiidizdijQn 
inlettsiiie* of the perfect match «nd mismatch probes of each pait. If the gene ts 
expressed, the hybridizahon intensity {or ai^t•nily) of a perfect m<^icK probe of a p^ii^ 
should be recognitably higher than the corresponding mismatch probe. Generally^ 
if the hybridizations jjntensities of a pair of proties ar<» <^ubs^ntjaL]y the same, it may 
mdicaior the getxe is mi t^^piessed. Howewr^ thff^ deter m in stjon is not based on a 
sinj^le pair of probes, the d^^rmination of whether a gene is expressed is ba^ed on 
an analysis of many pairs of pri>be». An ^Kemplary process of comparmg the 
hybridization inb&mDrses of the pairs of probe$ be described m mote detail in 
reference to fig. 19. 

After ilie system compiitcs tlxe hybrcd i7ation intensity of the perfect 
match and mismatch prob«s, ihe system indicates expj^ion of the gene at step 906. 
As an o^ample, thesystein may indicate an expression call to a user that the g^sne is 
eillier present (expressed)^ imarginaJ or fibsent (unexpressed). 

fig, 19 shows ii flowchart of a pmcop/ss of d^bt^nnining If a g^ne is 
expressed utilizing a decision Writrix. Al step 952, the compatser system receives 
raw scar^ data of N pji/rs of perfect match and mismatch probec. in a preferre<:t 
emlwdiinenr, the hybridization mtensities are photon counte ffom a fJuorescein 
labeled target that hybridized to the probes on the sttbstrate. For simpHicityr 
the itybridizafion inlie«sity of a perfect match probe wjii be designed 'T^,,.;' and the 
hybridization intiensiti^ of a mi^mah^h pnob« will be designed 

Ilybrxdiziilion intensxti^ for a pair of probes is retHev^ed at step 954. 
The background signal intensity i$ subtracted ftotn c^ach of Ihe hybridization 
intensities of the pair at step 956, Backgiwnd subtraction may also be perfotmed 
on all the raw ^can data at tlie same Lime. 
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At 5lep the hybridization intensUjes of livi pair of prohp^ ^ire 
campfiHid to si diiCer*HKc threshold <D) and a ratio threshold (R). It is dstemiicied 
if the difference behveen ihv h^/tjrldizat/on mlenisit)^ th© pair - T,„^) m greater 
khatt or equfil to the differencd threshold AND the <{uoricnl of the hybridizatiou 
intensities cJ the pair / I„^) [s greater than or equal to the ratio threshold. Th« 
dlffetence thresholds arp typically uwr defined valu^ft that have been delsei-mincd to 
produce accur^l^ e^tpression ixionitoring of a gene or gene& In otie emboJiment 
tbo diff^r^Ace threshald 20 imd the ratio ^hi^lwld Is IJI 

If I^, - >" D and ip^ / >- the value NFOS w incremented at 
stop 960. In general NP06 b d value th.it tiidicdl?^ the numbej- of paire of probes 
which hav'c hybridfiz^Ljon intensilies indtoiliug that the gene te likely expressed. 
NPOS i$ utiltzed in « dt?ti?rminafjoti of the expression of thegoixe. 

At step %Z it is determined U I,,„ - I^^ >- D and I„^ / I,^. :>= K If 
this eNpiescion is true, tlie value NNEG is incremented at step 9&t In general 
NNEG is a vaOiie that indicates the nui«ber of j^irs of prober which have 
hybridization mtensfiies indicating that the? gene is litcely iK)t expressed. NNEG, 
like NPOS^ is utiljz<>d m a determine Uoti the <:>cpre8sioa of the geiie» 

For each pair that exhibits hybridi^tion inljeii^iti^^s either indic^^tlxig 
the gene ts expressed or not e?ipressed, d log ratio Vdlue (lR) and intensity 
differeiKG value (IDIF) are caktilaied at sfep 966. LR is calculated by tlie log of the 
^^notient or the hybridization intensiHe^i of tkt& pair {J^^ / The IDIF » 

calculated by the difference between the hybridization intensities of the pair (I^^, - 
l^J, If there is a next pair ot hybridization intensities at st^^p %S, ihey are 
tietrieved at step 954, 

At step 972, a decision matrix is utitized to indicate if the gene is 
expressed. The decision inaftix utilizies tlie values NF, NPOS, NNEG, and LR 
(nnuliiple LRs). Tt^ following four assignments aiv performed: 
P1*NPtX/NN6G 
Pl-NPOK/N 

P3 o (10 *SUM(LR)) / (NFOS + NNF/S) 
The$e P values are then utilized to determlite if the gene is expressed. 

For purposes o/ iiltistratiort^ the P values are broken do^ii into ranges. 
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If Pi 4s greitter than or equal to 2X then A is tn«*. If PI j? less than 2.'i arxl greater 
th^R or equai to Iheu 0 i» brwe. Otherwise, C is rrue. Hius, PI is broken down 
into ran^ X If &nd C. This i& done Id aid the reador.^ understanding of (he 

Thufi^ all of (he P values arc broken down inio ranges at cording to ti>e 

A = (PI 2.1) 

C = iPl <1.8) 

X = (P2>= 0.35) 

Y = <0.3f»>P2>= 0.20) 

Z = {ra<0.20) 

Q = (P3>-1.5) 
R-(1.5>P3>= 11) 

Once the P vaJuei broUm down iiito ranges A<<cot^ing to l'lb€^ Ahov^ boolean 
vdue^^ ^he geru? expi^^ion determined. 

The ^ene expression is indicated as present <e)cpr«&sed>y mj^rgtazk) or 
absiai-il; (nol expressed), i'he g«ne is indicated 49 expressed if die follDwing 
expression Is hu^ A «acI (X ot V) j»jid (Q or K). In olher word*, ihe gene U 
indicated as expiessed if PI >= 2.1, P2 >- 0.20 and P5 >^ 1.1 Additiofially, Ihe 
gene I2 ind icntpd as tr^jpressed if the foUo^vir^ eyp|iess4oii is true: B and X and Q. 

Wilh the forgoing explanation, Ihe foJbwuig is a sununary Ch«^ ^ene 
expr^s^ion indkAilofis: 

Ptesent A and <X or Y) and (Q or R) 

B and X ftnd I 

Marginal A and X aiul S 

B and X and R 
l&and y and (QorR) 
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AbsenI All others cases (s.^., any C combination) 

In the outpul to the matj present may be indicated as 'T/' mai^g^iiMil "M" and 
absent a» "A" at step 974 

OiK© all the paiw of prober hav« been proc^^xKi and the expression ol" 
iivs gfsie indical^, an average of ten btitK» the Lite is computed at skp 975. 
AdditionaUy, :in average of the iDlF valu« for the probes that increineub?d NPOS 
and MNEG is catcuJaled, which may utilized aft an expi^ftsion level These 
values may be uliiized for qwantftative coinpar!$oji5 of this expejimentii with other 
f»>cp4^ijiiient». 

Quantitative measurements may be performed al step 976. For 
example, the current ofpennient xnay fce co«npared to a previous exp«fime»t (c.g.j 
uiihzing values cakulaled at step 970). Additiosially, the experiment may bo 
compared to hybridjzation intensities ct RNA (such as frofn bacteiia) pi^nt in ttie 
hjoJogical sample in a known qiiantfty. la this mannet, on<? in^y verify the 
correctneM of tJ^e gene expi-esslun indication or cM, modify thl'^stiold values, or 
pel form any number of modifications of the preceding. 

For simpi!Lc%^ Fi^ 19 was descrilwd m rd*?renc« to a single j^ene. 
However, Hie process may be utilized on niuJliple genes m a bioibigita) sample. 
Therefore, any discussion of the analysis o( a stngl<^ gerte is rkot an Indication tJiat 
llie process may not he extended to procei^sing maHipie genes* 

Fig, 20 shows a screen display layout of gene Q??ipressioii monitoring 
software. A screen display 10(K) is divided inlo two f^ecttoii;?: a graphics di&play 
area 1002 and a data <U$play area 1094. The graphics display area is far displaying 
graphs which wUI oid the u»4»r in interpi^tijig the data. The data display area is for 
displayitig th« underlying data so the user a^ay evaluate the underlying dafci for 
gene expression. 

As will lie shown in subsequent screen displayis, the data display area 
ifi preferably otjanizcd in a fable Iui\'ing rows and colunifis. Each column has a 
Iteadlifeg indicating ihs daHa Aat resides in the column. Each represent dala 
from a single experiment or combination of experiments for a gene. Tlie term 
"e>cpenmeiil'' ia used herein to de^rihe a process lhat creaisd data. For ex^mpie, a 
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single image file of £k hybridized chip may produce many "experiments^' for a 
number of genes. Additionally, experiments may refer to daVa oblaiiied from 
dififerent chips. 

Fig, 21 A shoiv» a sci^t d(sphy lilustratiiig tte aiuiiyeiis of a selected 
gei^e. A screen display 1030 Includes a graphics dispUy «rea tli^t illustriites with 
tar gtiiphs the hybfidization mteii5i«<%s of perfect match prob^ and inismakh 
probes at e^ich base position in a selected gene. The gene selected is &hown 
higJdighted in a da£:a dbplay area 1034. 

The d<^t£) display area include-? a itmaiber column headif^ The 
l?)cperi3iient Name tv?£eis to a uder-defjned name for Ihe experiment The Gejic 
NaiT^ is tlie name of Ihe gene- The numbers Positive and Negative refer to the 
Yalues NK)S and NKEG as described in refereitce to Fig. 19. The Pairs column 
indicates the numhejr or perfect: match and misnubch probe pairs thnt were utilized 
in the analysis of ttie gene. The Fraction colwmn indicates the firacdon of probe 
piii)[S UiatT^'e?!re scored as positLve {i.e., Po*itiVt?/P^irs). 

The Avg Rado column indicates the average of Ip^yi^^ for a(| probes 
for a gene. The Log Avg column OfidicatGis ihe aN'erage cf th« log(l^„/I„„). The 
?U E>coess column irxiicate^ %hQ number of periect tnabch probes ^hai exhibit a 
hybridization intensity aibov^ ^ n&er defined thirechold. The MM Excess indk-jites 
the titJmbcr of mismatch probes th^^t exhibit a hybridizatton intensity above a wier 
defined threshold. Referring now to Fig- 21B^ the Pojs/Ne^ ^^oliijnn indicates r^tio 
of «he Positive column to the Negative cojurom ('Inf*' k uyiiz<>d if the Negative 
column includes d 23ext>)- The Avg Difl column indicafes tlie average intensity 
difierence for the gene. The averagi; inbensity dif faience was computed at step 975 
oi Fig. 19 <i average(IDjF)). 

The Abs C^ll column indicates the gene expression call for the 
expeyimeni The values in this column may be ^'P' for pre^t for marginal 
and "A" for absent. The gene expression call jfor a preferred embodiment js 
described irtt snore detail in reference to step 974 of Hg. 19. 

As the ^ser selects an e^tpentneiti Ihe graphics display area displayjs 
giaphs to aid Ihe user in interpreting the data. A button b^r 1034 enables the user 
to select: which graph or graphs to 6ispAikY in the graphuis display area. 
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Additionally', the user js able to sort the tUisL in the display data ^re according to 
values in a selected coiunm. 

Fig. 22 shows another screen display lEJufltriiting the an^lyMS of a 
i^ckcted gene. A screen display 1060 includes ^ graphics dispi^y art>^^ 1062 
itlu^trafing a graph of fhe ratio of the hybridizahoti mtensit}' of p^yfect match 
Iprubtr to tlie mismatch protM* ^ach base po&Kioti. Ih^ x-axis js Ih© bace {>^?Gjhon 
and (he y-axas i$ itie ratio of hybridization inti^n&jties. The sCati$tica1 ratio 
thte&jhold 16 plotted on the gr^ph, ^hich iii this example is 1*2. tliis graph trtay Ixf 
utilized by the user ti> anajyze how many probe pairs (I^^/L J are above or below 
tlw threshold. The graph also includes the gene atid eTtpcriinent: nnmes. 

Fig, 23 shows a screen display illusti-aring llie cumparLsoiD of 
experiments for seHecled genes* A wreen display 116Q inctudes a ^-aphics display 
area 1062 and a data display are 1164. Tlic graphic:? displaj^ area ictdudefi a graph 
of the ratio of the h/biidizatioft i^ten&Jly of ihe perfect match prabe to Hv^ mismatch 
probe at each base position for each of tiie experiments/ gem^5 sdected in ttw data 
display area. In a prafeEred emhodiment^ the experiment L-tame^r gene natne, a ad 
dala plut are a different color for e^ch g<?ne to Allow the ujyer to more easily see the 
differences betw^n or amon^ selected geties. 

Fig. 24 shows sinother ^ct^en display ftllustt^ting the compariAiin pf 
tpyperimenfe for seJecU^d genes, A screen display 1200 mcludes a graphics display 
area 1202 ilOustrating thp pypr^^ssion le\^els of genes selected in a data display area 
1204. ThE^ graph of the expression levels of the i^dccted gene^ 13 a bar graph. n 
preferred embodimeiit^ the expression Devel os dlefgi^d as the average intensity 
difference (ses averag<^(ID[iF) in Fig. 19). The graph aJ$<i includes die g^e ^ind 
e>!per iment: ndmes- 

Fig. 25 »hows another screen display illu&tjnaticiig the comparison of 
experiments for selected genes witlx multiple graplxs in the graphics dif^play area. 
A screen display 1230 includes a graphics display area 1232 depicUng multiple 
graphs for analyzing the genes selected lu ei daba display area 1234. An expression 
lev-el graph 1236, an a\^age intensily difference graph 1238 and a hybridizaiion 
intensity graph 1240 are shown for the selected genes. 

Figs. 26A and 26B show the flow of a process of determining the 
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expression of a g^ne by coinpaj-itig baseline sz&n data and experimertki] scdik drtta. 
For example, the baseline $can data may be from a biologici)] ^Arhipld where ii {$ 
known the geiie is expressed, Thus^ this scan data may b(> coni|>Ar$d ix> ^ diff«i:«nt 
bipioj^kral sampk to determine if the gene is expressed. Addilwrially/ it may 
determined bow the oxprrasion of a gene or gec>es changes aver time in a bi<?]ogicaS 
orgaiilsni. Accordingly, \he term "baseHne" msans thai it will be used as a point oi 
refierence. 

At step 1302, the computer system receives raw $raA data of N pairs oi 
perfect match and mismaioh probes from the baseline* The hybridizalion intensity 
oi a perfect makh probe fr«m the baseline will be designed and the 
hybridization intensiiy of a mismatch probe from the baseline wIH be designed 
"I^,„,." The background signal intensity is subtracted from each of tlie hybr idizaUoi) 
intensities ot the pairs of basch'ncscan datix at step 1304. 

At step 1306. tlte computer sy&lem rc^teives raw scan data of N pairji of 
perfect match and mismatch probes from the experimental bioJogical sample. The 
bybridizaticm mtpnsBty of a perfect mati?h prob^ss from the experJmeiit wiU fa^ 
designed "|p„" and Ihe hytmdization intensil}' of a mkmalxrh prob& from the 
experiment will be dUesignc^d ^^i^, ' The ba<:kground signal mtenRit)'^ is subtraclvd 
from each of ihe hybridization intensities of the pairp of expefimeotal scan data at 
step 1308. 

The liybhdization inten^jtit?* of an I and J pair may be normalized at 
Bt^p 1310* For example, the hyl>iridjzaUon intensities of the I and ] pairs may be 
djvoddd by ^lie hybhcUzation inteiv<iity of control probes- 

At slep 1312, the hybridization intensities of the I and J pair of probe* 
are compared iso a difference threshold (DDIF) and a ratio threshold (RDIF). It is 
determined il- the dlKerence between the hybridizsition intensities of the one pair 
(U " ^^^^ P^^' (V - 'n*'") greatei than or equal to fhe difference 

threshold AND ihe quotient of the hybridization infcensiti^ of one i^oir tf^^ - 
and the other pair {J^.^ - L^) are greater than or equal to the ratio threshold. The 
differences thresholds are lypkally user defined values ttiat have been determined to 
produce accurate expi^ssioi) monlloring of a gene or genesw 

If dp. - U) - (Ip« - U) >- DDIF and - L J / (I^ - U) RDIF. 
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the value NlNC is iacrcxnenled $tep 1314. In geui^aL NINC Is a value thai 
indicati>s the experime;ntd pair of probes indjcate^ that the gene expr«$$(on Is likely 
greater (or increased) than the baseline sample. NINC is utilized in a 
determination of whether th« expression of the gon« is greater (or increased), Ieji» 
<or decreased) or did not chdng« in the experimetihil sjainple compared to the 
haselmesampJe, 

At step 1316; it is detormined if - - - 1, J DDIF and 0,™ 
- Lm) / Cro* / ^= RPIF* J^' expression is true, NDEC is iftcremented. Iti 
g«EX»ral, NDEC is value that indicates the experimental pair of prober indicates 
liiAt the gene expression is likely less <ur decreased) than the baseline sample 
NDEC ia utilized in d deienninatif>Ji of whether tixe expression of iJit? gene is greater 
{oi iiKroascd), !ess (or decreased) or did not change in thje experltnenti)! sAniple 
compared to th? baseline sample. 

For each of the pairs that exhibits hybridtzation intensifates dther 
mdicaUng the gene ks expressed more or 1««« in the experimcnKit sample^ the values 
NPQS, NNEG and LK are c^kuUt^d for each paii of probes These values ait 
calculate diMTussed above itx reference to Fig, 19. A sMidx of cith*ir "B" or 
has b^n add^ to each value in order to indicate if tK<? v^lue denotes the baseline 
santple or Ih^ experimeatai sampJe, ref5pectiveJy. Bf ^here air^ next pairs of 
hybridization intensities at step 1S22^ they are processed in a simitar manner as 
shown. 

Referring x\ow to Fig. 26B, an ab.^olute decision computation is 
perfc^rmed for both the baseline ajid expenmenlal sampler at step 1324. The 
absolute decision cnntpti Nation is an indication of whether the gene is expressed, 
marginal or absent in each of the baseiine and experimental sampler Accord i ngly , 
in a pi^fcrred embodiment, ^his step entails performing sJeps 572 and 974 from Fig, 
19 for each of the satmpLes. This t«?ing done^ there is an indication of gene 
expression for each of the samples taken alone. 

At step 1326^ a decision matrix is utilized to determine the ditfwetice 
in gene expression between the two samples. This derision matrix uttlizes the 
values, K NPOSB, NTOSE NNEGB, MNEGE. NINQ NDEQ LRU, and LRE as they 
were cs^loilafed above. The decision matrix perfoims different calculations 
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depending on wlielher NINC i» greater then oi equal to NDEC The a^lculatiODS 
areas foJlowi?. 

If NINC >= NDEQ tb^ following four P valued ai* determmed; 

PI = !NiIl\'C / NDEC 

P2-NINC/N 

' ((NPOSE - NPOSB) . (NNEGfi - NNTCB)) / N 

P4 = 10 ^ SUM(LRE - LRB) / N 
Tliese P values Jire IheiX utiiiZf^d to detemme the difference In gene expression 
betu-eeix the two sample*^. 

For ptir]>05e$ of illustration, ibe P values aj^ broken down into ranges 
ii5 was dene previously. Thu*, all of the P valuer aiDe broken down into rjnges 
accordjjcig to the following: 

A -(PI Z8) 

B = (2.8>P1 >- 20) 

C-(P1<2.0) 

X = (F2 >-> 034) 
Y-(a34>P2>«0.24) 
2 = (P2^0.24) 

0-{P3-:Q.12) 

Qo(P4>=0,9) 
R«(0.9>P4>^ 0. 5) 
S-(P4<:0,5) 

Once the P values at<^ broken down into rackgeis according to the above ^olean 
values, the differ^ixce in gene expresEsiun beti\eefl the two sampl«fi is delennined. 

In this case whew NINC NDEQ ihe gene explosion change' in 
ft.t\dicah><l as iRcrpa$«d, ntarginal increase or no c1\»a{;<?. The following ia a 
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sunirn^ry of the geue expression indkatioiis: 

Iiicreasetl A and (X or Y) and (Q or R) and <M or N oj* O) 

A and <X or V) and (Q or R {>r S) a nd ( M or Ivj) 
B 4nd (X or Y) and (Qof R) and (M or N) 
A and X and (Q or R or S) and (M or N t>r O) 

Mi)i:;gina] A or Y or S or O 

Increase B and (X ot Y) and (Q or K) a nd O 

B and (X or V) And S And (M or N) 
C and (X ot V) and (Q or R) and (M or N) 

No Change AH nihers cas^ (e.g., any Z combmatson) 

In {\ie output to tbe xns^t^ inaemsed nn<»y hp mdiCiited as "V tn^irglnstl increase as 
"Mr and no change a8 ^'NC*' 

If ^3INC < ND£C Ihe following four V values detenuined; 

Pi = NDBC / NINC 
P2 = NDEC/ N 

F3 = ((NNEGE - NNEGB) - (NPOSE - NPOSB)) / N 
P4 = 10 ^SUM(r.RK - LRB) / N 

Thes* P values Are dien utilized to delermine the difiexence in gene expression 
between the Iwv sdmples. 

TIhe P Viiiaes s=ire broken down into tlie same ranges as for tli* other 
case where NiNC >= NDEC Thus, P values in thifi case indinit^ the s;nne ranges 
and will not he rep«atx?d for tlie sake of brevity. However, the ranges generaify 
jndfcate different cbAnge^) in fh« gene ejcpressjon between the two samples as 
shown below. 

In thus <:a.se vk^here NiNC < NDECr Uie gene expression change iet 
indicol^d jis deci^ased^ marginal decr<»a$^ or no 4:hange. The foUowing; is a 
summary ofthe gene t^xprossion indkatiom: 
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Decre^ised A and <X or Y) and (Q or R) and oi N or O) 

A and (X or V) and {Q or R or S) and (M ot N) 
B and (X Of Y) and (Q or R) and (M or N) 
A dAd X and (Q or K or S) and (M or N orO) 



Margii^stl A or Y or S or O 

Deei^Mse B and (X or Y} and (Q or R) and Q 

B atxJ (X or Y> and S and (M or N) 
C and (X or Y) and (Q oi R) and (M or N) 

No Changes AIJ others cases (t.g., any Z combinatiojn) 



In the outptii to the us^r, derrea.^ may be indkat«Ki as "D," marginal d«4:re9$<> 
*'MD'^ Qt\d no change "aVC" 

Th^ abov<) hsiv shown that th& relative diffm^ce between the gene 
ocpr^^aion between a baseline stitripk? and an expert menial f^.imple may be 
defermined. An addfltional ttt^t may be perfort?ted that wox^id ch^inge an [, MI, D, 
or MD (t£,, not NC) ca[i lo NC if th« g«n<f is indicated as e^prsssed in hdfh samples 
(e.g.^ irom shop 1324) and th« following expressions are alL true: 



Avcfage(IDlFli) >^ 200 
Averag€(IDlFE) 2U0 

1,4 >= AYerage(lDIFE) / Average(iD1Fl5) >° a7 



Thus, when a gene is expTiess«d In botli saniples, a call of increa^ or decreas^^ 
(whether marginal or not) v/ill be changed to a no change call if the> average 
intensihr difference for each sampk^ is relabx'el/ large or substantially the same for 
both ^mplc?^. The IDIFB and 1D1FE an? calculated as t]ie suox of ail the IDJFs tor 
e^ch sample divided Ly H. 

At step 1328/ valu«» for quantitative difference evaluation are 
calcu!afc(?d, Aib average of {0^^ - J,^„) - (I^^ - 1« J) for each of the pirs i» calctifoted. 
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Additionally, a quotienf of the average of Jp„ - J„„ and the average of 1^^ - is 
calculated* These ^'alues maiy be uHlized to compare the reft(iU» w^ith obher 
expei'iuients in step ti30. 

Fig. 27J\ shows ^ screen display illusirat(r.g the monitoring of the 
chnnge of gene expmi&ion berween a9£priment8. A screen dkplay J40U liKludes a 
graphics display area 1402 and a data dispLiy area 1404. A user begin.^ the 
comparison of experim^ntd for a gene by selertmg two expeHmente for a gene. For 
simptkHy, vn^ will call ooe baseline data and the other exj>enTnentiil d^ta, mt^aning 
jt will be compared to th« baseline. For example, a user may select hvo 
experiinent5 for the geste with, the name '*gl8250(i/ A compdiisori of tH'o 
cxpcritn<ml5 is an ex{:>eriment ibelf so the uaer is nble to enter an expeiiment: name 
which was entered as "foo" in the dala dl&piay aiw of Fig. 27A. FJg. 27B shows 
^nolher .screen display illustraUng monUodng of the change of g<^m exp'^ssion 
betwneea experJmef!ntSw 

The sysben) ttxen determines the chai^ge in gene exprsssion between 
lft6 f^Glected experimenbs according to ihe process described in Figs^ 28 A and 28B, 
Tiie data cli$pky area mcludes columns d^n^iing the data pircdured by this 
comparison^ The E>^pc>riment Nani4> refers to a u&er-definod ueime fcH- the 
<oinp<)ri5on experiment Tho Gene Nome is the name o^' the gene. The numbers 
Inc and Doc ref^r to the values NIMC and NDEC as described in reference to irig, 
26 A Mon? specifically^ Inc refe?r$ to the number of base posihons in the gene for 
vhich the differene*? And ratio of the per/ect ntaich and fnisjnatrh hybridize Non 
iatstisiHes <ire dignific^intly grealer in Ihe cxperioientnl dala^ 

The Jnt RaHc^ column indicates the number q( base positions where the 
hybridization intensity increased divided by the total numbet of base positions in 
th« gene which »re analyzed. The Dec Ratio column indicates ttie number of base 
positions where the hybridization intensity decreased divided by the Uit«l number 
of b;i»e positions m the gene wtiich are analyzed. The Pes Chcinge column 
indicaties the difference in the niimbor of positive scoring probe pairs in tlie 
experimentai data versus the baseline data. The Neg Change column indicQtes ite 
difference in the nMtnber of negative scoring prober pairs (perfect matrh and 
mismatch) in the expedmentaJ data versus tite baseline data. 
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The Inc/Dec column indicates the number probe p«iiis which had an 
increase in hofbridiz^^Sion intensify in the ex]>eiimenUI data vcrsua the number of 
probe f>idrs whkh hai d decrease in hybridization intensity (n the experimental daM- 
The Avg Diff rolumn indkiites the average iiitensit);' dtlferance in Uie expeiim^ntal 
ddU. 

The Diff Call 4?o](imn (not shown) indicales th<? change in expresjsion 
level b^ttv^en th« ^periments for the gene. The column showe a "1" for increased 
gene expressiotv "Ml" for marginal inciea^ed gene expre^sioi^ "D" iov decre^ed 
gene expressions "MD" for marginal decreased gene expression, "NfC" for no change^ 
ttiid for unknown* In a preferred embodiment the tbang;e in expression kvd is 
calculated 4$ described in fefercnce lo step 1326 of Fjg> 26B. 

In addition to calculating the change in gene expression/ the user may 
also select graphs to an^lvze Hie data< Graphics display aroa 1402 sbow» Uiree 
different graphs depi<:ting the data from the b<iseline and expetimental data. 

Fig 28 shows a screen display iliu.^trating a three^diin^isional bar 
graph which illttstales the change of gene expressioti fceh%'een «xpc*riment», A 
screen display 1440 dispJays a gr^^phfral display area 1442 incJuding thfx^<?« 
dimensional har graph of the expression level of seiectetl g'QU^s in a data display 
area 144^^ The ii^t selects one or more g&n<w; in the data di:^play area and then 
instructs the systenx to generate a diree-ditnensional bar graph of the expre^ssion 
level of these genc^^ wt>ofe the expresaon level in a preferred embodiment b tlie 
average intensity difference (i.e., av<&rag«(lDJF). The three-dimensional bar graph 
allows the user lo easily view the expression level of multiple genes. Addjtloiialiy, 
similar genes slecteii rroni muOtipte ex|ierinientft may b<^ shown simtAllaneousDv and 
routed to display differences in expcrssiou levels. 

Cogcl^.^toQ 

The above description is i]1ustraH\'e and not festrictive. Many 
vafiations of tlie inveation will become apparent to Itiose of sMil In the art upon 
review of this disclosure. Merely by way of example, while the Invention l& 
ilJiisti*ated with partii:(ilar reference to the evaluaiiofi of DMA (natural or unnatural)/ 
the methods can be used in th« analysis from chips ^-ith other materials synths^&zed 



(75) 1 0 - 2 7 

5 = P A 5 2 C y f? ^<>— (44) 

(hereofv flwch as RN A. The st^ope of rhc mvcfttion should, therefore, be determined 
not with reference to the above description/ but iaste^d shouJd be dcferminsd with 
reference lo the dppeuded claims along witli their full scope of equivalents. 

4 Brief Description of Drawings 

Fig. H fllu&rratps an example of a computer sysh>m that may be used to 
execute softivare embodiments of the present invention; 

Fig. 2 &I10W5 n system block diagmm of a typici^l computer system; 

Fig* 3 illustrated an overall system for forming «nd anal^'zing arrays of 
biologici^ materials such a^; DMA or l^NA; 

Fig, 4 is an illustration of an ombodtmertt* of software for the overcill 

fly.itein; 

Fig. 5 Ulustrateft the global layout of a chip formed in the overall 

system; 

Fig- 6 Hlustrates conceptually the binding of ni«ciek acid prober on 
chips to a labeled large V 

Fig^ 7il]ustrd.tes nucleic acid probes arranged iti Mnes on a chip: 

Fig. 8 iliustifates^ a hybridization p^Hi^rn of a target on a cfup with a 
reference sequence m in Fig, 7; 

Fig. 9 illa'^lrate^ st^rtd^xd and alternate tilings; 

Fig» 10 *how6 a acreeji di$plf^y of hybrklizotion intensities from e chip? 

Fig. 11 is a flowchart of a process of computing a call from 
hybridization intensities of related prober; 

Fig* 12 iii & flowchart of another process of computing a basi»call from 
hybridi^dition lAh^nsiUe^ of related prober; 

Fig. 13 h a riowLtwrl of a proeess of calling bases in a gix)up of untts; 

Pig, 14 i» a flowchart of piXKess calling bases for multiple groups 

of units; 

Fig. 15 is a flowchart of a process 0/ calling a ba$e for a gioup of units; 
Fig. 16 K> a flowcJiarl of a proce^i^ of sdetting a best group of units for 
performing a bfisecnll; 

Figs. 17A and 176 show scxeen disp[a}'s alloiving analysis of 
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nucleotides from expc^rlments from one or more chips; 

Fig. 18 shows d high level flowchart of a process of inoiii^onng the 
expression of a gen<^ by comparing hybridization iiitensiti^ of pairs of perfect 
match «ttd mismatch probes; 

Fjg. 19 shows a flowchart of a piocess of detFrmining if a geae is 
expressed utilizing a decision matjrix; 

Fig* 20 :>hovvs a screen display layout of gene exprfe^&sion mouitoring 

software; 

Figs. 21A atxi 21B show screen displays iHlusU^ting Hie analysis of st 
selected gene; 

Fig, 22 shows anotiier screeii display llluslrating the analysis of a 
selected gene; 

Fig* 23 shows a SLT^en display fllustrating the comparison of 
experiments for selecfed genes; 

Fig. 24 shows another screen display illMstrating »he comp;uison of 
experlmenis for selected genes; 

Fig. 25 shows another screen display illustrating the comparison of 
e>^perlmenls for selected genes with multiple grapl^ in the graphics display area; 

Figs. 26A and 26B show a flowchart of a process of determining the 
expression of a gene by comparing baseline scan data and experimental scan data; 

Figs. 27 A and 27B show screen displays illuslraiing (he monitoring of 
the change of gene expression between experiments; and 

Fig. 28 shows a screen display dllustrating a three-dimensional bar 
graph which jllustrates the change of gene expression between experiments. 



